OpenStreetMap logo OpenStreetMap

Improving the OSM map - why don't we? [8]

Posted by marczoutendijk on 1 August 2015 in English. Last updated on 23 April 2017.

Where do we leave our Garbage?

Taginfo is a great tool to see where and how a given key is used on the map. It also gives you some nicely formatted tables with statistical data of all the tags (a tag is a key=value pair).
Did you know that the most used key is source=*?
On 25 july 2015 it appeared 162.428.193 times with a total of 143.491 different values. You can find the most common tags here.

But I was more interested in keys that appear just once in the database, because I expected many of those “solo” keys to be erroneous. To find that out, I downloaded the taginfo database (there is a link on the taginfo page to do precisely that). Be warned: after downloading and expanding that database, you have a 5 Gb file in Sqlite format to handle! But doing so, I could do my research with more details and faster than using the tools on the taginfo site. I opened it in my Sqlite client and after 10 minutes: I had 74.569.089 records on my computer to research!
For every record in the database you have numerous tables with information available, one of them gives the information of all the keys that are in the database:
and this helped me to find what I was looking for: the count_all field.
Below is this “keys” table with the first 20 entries: The count_all field is the one I needed and after the necessary code I produced a table with all the keys that appeared once (had a value of count_all=1).
Here is the beginning of that table, after I sorted it (well, my computer was so kind to do it for me!). Rather weird names for a key, don’t you think? What would “+++” denote? Or “129/”?
The first entry (source:name) starts with a space character! If you enter that string (or any of the others) into the search field of taginfo, you can get all the details about that key: its value, how many times used (1) and where it is used if you click on the tab “map”. You can also click on the overpass link to see its precise location.
So, for that first entry I did all that and it turned out to be this: somewhere down-under in Australia. Try it for yourself!

Keys in a database are not supposed to start with a space character, but the OSM database accepts anything and does not do any check on what you enter, save for the length (max 255). Also keys are supposed to contain alphabetic characters and may contain digits as well. Some special characters are allowed also, like e.g: “_” and “-“.
But a key with just numbers? What is that? Let’s see for 09200: It seems to be the postal code for a village in France. But then it should have been:
addr:postcode=09200
addr:city=Montégut-en-Couserans

A lot of the (faulty) keys I found are of the uppercase/lowercase type:
Name when name was meant for instance. Almost any regular key (amenity, shop, tourism, highway, landuse etc) appears in a misspelled version in the database (tourims, land-use etc). Also added interpunction (name; or name, or name-) counts for quiet a number of those one-time-only keys.
All in all 19.037 keys appear only one time in the database and out of a total of 54.382 keys, that is more than a third!

Not all of those keys are “wrong”, but too many of them are, and will never be used again.
Is that a problem? Not really I think. It does not consume very much of disk/memory space, certainly not if we compare that to the huge amount of data that is also in the database. But sometimes it leads to unexpected results with software that consults the database.
So, the answer to “Where do we leave our Garbage” is: just where it is.
But if you ever come accross such a situation, please correct it and remove what is not necessary or redundant.

Email icon Bluesky Icon Facebook Icon LinkedIn Icon Mastodon Icon Telegram Icon X Icon

Discussion

Comment from RobJN on 2 August 2015 at 00:20

One option is to leave it where it is now but have the editor suggest you change it if you edit that object (for example it could suggest you change “Name” to “name” if the latter does not already exist).

Rob

Comment from Hedaja on 3 August 2015 at 21:05

Taginfo offers a tool to find those misspelled things: http://taginfo.openstreetmap.org/reports/similar_keys At this time a report for the number of used keys was introduce, too. http://taginfo.openstreetmap.org/reports/historic_development#num_keys As You can see in the graphic. After the tool has been published in March (http://blog.jochentopf.com/2015-03-05-new-taginfo-features-and-a-challenge.html) there was quit a nice drop. But people starting to forget the tool and the numbers are rising again.

Comment from marczoutendijk on 4 August 2015 at 09:33

Thanks @Hedaja for the link to Jochen Topf’s blog! I didn’t know that. Seems that Jochen and I are concerned about the same.

Log in to leave a comment