There is a long discussion happening in the United States section of the community forum regarding where to draw the line between the “main” populated place node values, and specifically the place=* values of city and town in New England. I thought it would be useful to do a bit of analysis to see how these values are distributed across the database when compared to population. Through this analysis, I include all tags which have place values of city, town, village, hamlet, and isolated_dwelling. I also only include nodes that have a population tag.
My overpass query for each category looks like this:
[out:csv(::id,place,population;true;"|")][timeout:60];
node[place=city][population];
out;
One of the challenges of analyzing this key is that because it represents order-of-magnitude differences, its distribution is log-normal. In other words, it forms a bell curve provided that the X-axis is drawn logarithmically.
To look at this data logarithmically, I grouped the place nodes logarithmically, in steps of 1, 2, and 5 per 10x jump. When viewing the distribution of place=town, the log-normal shape comes out quite clearly. The number on the X axis represents the upper limit of each bin.


