Authoritative Data is Not More Right Just Because It’s Authoritative
Posted by tordans on 1 October 2025 in English.HeiGIT recently published an analysis together with the German Federal Agency for Cartography and Geodesy (BKG), comparing land cover data from OSM with the official CORINE Land Cover (CLC) dataset from BKG.
I want to use this opportunity to make an appeal to HeiGIT and similar projects analyzing OSM data: just because data is published by an authoritative (mostly government) source does not make it more correct than OSM.
I’ve often observed that OSM is compared to external datasets, and the analysis is framed around the question of whether OSM is “right.” This framing does OSM a disservice, because it suggests OSM is wrong and the other dataset is right.
In reality, all open datasets I’ve compared with OSM—whether bicycle parking, public parking spaces, buildings, cycling infrastructure, or cycling routes—have always contained errors in both datasets. The reality is: the publishing authority has no inherent influence on data quality.
Of course, this does not mean such comparisons should be avoided. They are very useful and important. But I urge that the way these analyses are communicated be reconsidered. The communication must make clear that such comparisons are evaluations of both datasets, aimed at finding similarities and differences. It must be explicit that this is not an evaluation of correctness.
Correctness of data can only be checked through ground truth and usually by sample analysis. This is a lot of work, but only this approach can truly allow for an assessment of data quality.
At this point, it would also be valuable for such analyses not only to acknowledge that all datasets contain errors, but also to highlight one of the central advantages of OSM compared to other datasets: how errors are handled once they are found.
No other open dataset I know of has such a clear and straightforward process for dealing with data errors. I know many datasets in Berlin, for example, that have contained errors for years. But I have never seen a documented correction process. In most cases, the data is treated as “frozen” and updates are neither planned nor possible.
In contrast, OSM has simple and transparent processes: one can leave a public note, or—once familiar with editing—even correct the data directly.
That may sound trivial, but it is one of OSM’s core advantages over authoritative data. Any analysis comparing OSM should also point out this aspect, especially if it frames other data as “the truth.”
I am certain that a systematic sample check would reveal many cases where OSM is actually more current and more correct than the respective authoritative dataset.
These thoughts have come up for me many times when reading about dataset comparisons. Sometimes it is because a study aggressively focuses on OSM quality—which is fine and necessary—but fails to stress that quality evaluation is not just relevant for OSM, but for any dataset. This was also the case in the summary of a 2023 Bachelor’s thesis published on the HeiGIT blog: Using OSM for location analyses of residential real estate projects: An extrinsic analysis of data quality.
But the trigger for this particular post is the new HeiGIT publication. Let’s take a closer look at some examples that I find problematic:
In the introduction, it says:
“The dataset works as a benchmark to evaluate the differences between OSM land use classes and the actual classification.”
The phrase “actual classification” suggests to me that the comparison dataset is inherently correct, and OSM has to be measured against it. This framing is unfair and harmful to OSM. What should be written instead, in my view, is something like “reference classification” or more precisely “classification of dataset XYZ.”
In the dashboard screenshots, we find statements like:
“This suggests that OSM does capture some land cover categories but may lack detail or accuracy.”
and:
“The F1-Score is used as a statistical metric that considers both the correctness and completeness of the land cover categories weighted by area.”
Here OSM is again framed as the potentially wrong dataset, while the reference dataset is presented as the truth. I cannot see how, based on the analysis described, such a conclusion can be justified.
The only accurate statement possible here, in my view, is that there are differences between the datasets. We do not know which dataset is more correct, more complete, or more “true.”
The same problem applies to positively phrased statements like:
“This suggests that OSM provides a reliable representation of land cover.”
This may frame OSM in a better light, but it is still methodologically flawed in the same way.
Later in the blogpost, the framing appears again:
“It provides a valuable tool for researchers, the OSM community and public administration to understand the reliability of OSM land use data.”
Again, OSM is presented as the potentially faulty dataset that must prove itself against the reference.
And this, for me, is the most important point: OSM no longer has to prove itself. OSM has shown that in many respects it is more current, more complete, and more dynamic than many authoritative datasets.
In fact, OSM achieves something that many authoritative datasets never will: largely unified data coverage for the whole world. Combine this with OSM’s ability to update rapidly and its transparent processes, and it becomes clear how mature OSM really is. Mature enough to be compared on equal terms—not as the dataset that must be corrected or tested for errors, but as the dataset that can be improved together with others through comparisons.
In closing: I am certain that HeiGIT and BKG know and appreciate OSM’s quality. Otherwise, such a well-executed analysis project would not have happened in the first place. Still, I believe the framing of such analyses needs to be adjusted.
Discussion
Comment from SimonPoole on 1 October 2025 at 12:27
As you point out this is not in any way new.
Dismissive language is used in essentially every paper that I have ever read from Heidelberg, particularly widespread use of “only” to describe statistical results from OSM both in the data and participation, but never is there a comparison with a dataset that would show that the specific number is actually worse than expected.
Comment from tordans on 3 October 2025 at 08:52
FYI, there is more discussion / comments on this post in the comments at https://en.osm.town/@tordans/115297914265203391
Comment from tordans on 10 October 2025 at 07:29
UPDATE: Last weekend, at the FOSSGIS 25-year event and in subsequent chats, I continued the discussion around this topic and want to add a few points:
It is, of course, correct and appropriate to describe datasets as different. What this is really about is how one dataset is presented as right and, as a consequence, the other as wrong, incomplete, or inferior.
An evaluation of a dataset must always take place in the context of a specific use case. For a given application, one can then make a recommendation that a certain dataset is better suited for that purpose — and explain why.
In the parts of the new HeiGIT analysis that I’ve read so far, I did not notice a description of such a use case.
The quality of the comparison dataset is also still unclear to me and others. Since its age, methodology, and limitations (for example, the level of detail) are crucial for assessing the results, it would be important to highlight these aspects — including in the short summary displayed above the charts in the tool. That section already links to the comparison methodology (which is good); it should also provide access to this contextual information, as it is essential for interpretation.
On another level: HeiGIT conducts many analyses within the OSM ecosystem and is widely seen as an expert in the analysis and quality assessment of OSM data. With that comes a particular responsibility to present the strengths and weaknesses of OSM — and of its alternatives — carefully and fairly.
And to stress it once again: evaluating and scrutinizing OSM without also evaluating the alternative datasets already distorts perception and is unfair to OSM. This becomes particularly clear in the Bachelor’s thesis mentioned earlier: OSM was evaluated carefully and thoroughly, which is great. However, the study does not address what alternative data sources exist and how they perform.
The result is that the takeaway becomes something like: “Caution — OSM isn’t entirely clean, be careful.” Instead of what would be a fairer conclusion: “OSM is an excellent data source — in fact, often the only one that provides reasonably good, up-to-date data with transparent processes for improvement. But like any open dataset, it may require regional updates and cleanups — for which processes already exist.”
Without this broader context, the impression remains one-sided. For many similar use cases, there are simply no real alternatives — and when alternatives do exist, they typically require at least as much manual correction, while offering poorer tools and less transparent processes.
In that sense, I would be very happy to see OSM framed more neutrally — or even positively — in future analyses, and to see an honest discussion of the weaknesses of other datasets as well.
Comment from imagico on 10 October 2025 at 13:57
You have not even mentioned the more fundamental issue with comparison of different classification systems - that they always re-cast one classification into another and this way create an inherent bias. And when comparing OSM data it is always the OSM classification that gets re-cast into the other one - with the inevitable losses in semantic accuracy.
W.r.t. ground truth sampling as the gold standard for quality assessment - that is only the case if you actually use true unbiased random sampling. And i have yet to find a single such study that does this. Most studies do not even document the method of selecting sampling locations and hence qualify as pseudo-science (for global analysis for example: picking a uniformly distributed set of random sampling locations is non-trivial - not to mention that ground checking a truly random set of sampling locations is very expensive). By manipulating the sampling, even in a subtle way, you can essentially freely modify the results of such a study.