Received some valuable feedback from the imports mailing list on the matters of data quality and the expectations on someone’s level of OSM experience before executing large scale automated data imports. I was pretty much well set in terms of data quality concerns, but it looks like I would need a bit more hand-holding from more experienced mappers and importers to properly execute a big import.
This is not a problem, though. It’s very reasonable and thankfully not a deal breaker to me because I chose a scope small enough that it’s feasible to execute this import manually instead of automated. In fact, my previous analysis that the VSI had about 570 coffee/café related business was an overestimation because - rookie mistake - I forgot to deduplicate by survey period.
The new numbers are:
- OSM Nodes Matching Coffee/Cafe: 574
- VSI Stores Matching Coffee/Cafe: 278
- OSM Nodes within 10 m of a VSI Store: 28
- OSM Nodes within 25 m of a VSI Store: 195
So yeah, lots of nearby matches to investigate. Now is the time to start fuzzy matching the business names and SK53 provided me some good reading material on that. It will be a bit challenging to do that with pure SQL (I’m trying to use dbt + BigQuery only for now), but I think it’s worth a try.
Discussion