Mechanical Edits/Mateusz Konieczny - bot account/import NKD websites in Germany

From OpenStreetMap Wiki
Jump to navigation Jump to search

This page describes import of website=* for NKD POIs in Germany.

The tricky part is that due to my mistake import mostly already happened - I am sorry for that, that is obviously a wrong order.

List of affected objects is at Mechanical Edits/Mateusz Konieczny - bot account/NKD_list - this can be also used to judge data quality.

If import will be rejected or not accepted, I will revert all such edits.

If import will be accepted I will post changeset comments on relevant changesets explaining that it was in the end reviewed and accepted. And make future edits based on new matches, once they will be found. This is especially likely for newly opened shops.

Goals

To add website=* where missing or imprecise.

Part of multiple ATP-based imports I am running.

To provide unique POI identification preparing ground for import of more data.

Schedule

depends of availability of my free hobby time.

Import Data

Background

Note: if some links are broken check https://status.codeberg.eu/status/codeberg and https://www.githubstatus.com/


Data source site: https://matkoniecz.codeberg.page/improving_openstreetmap_using_alltheplaces_dataset/import_possibilities_website_tag_empik_pl.geojson produced by https://www.alltheplaces.xyz/ via https://matkoniecz.codeberg.page/improving_openstreetmap_using_alltheplaces_dataset/ from ATP dataset produces from first-party NKD POI data (see https://community.openstreetmap.org/t/what-you-think-about-importing-opening-hours-data-from-alltheplaces/120608/77 for analysis)
Data license: see below
Type of license (if applicable): see below
Link to permission (if required): https://osmfoundation.org/wiki/Licensing_Working_Group/Minutes/2023-08-14#Ticket#2023081110000064_%E2%80%94_First_party_websites_as_sources
OSM attribution (if required): not required
ODbL Compliance verified: yes

Import Type

Recurring import done with automated scripts

Data Preparation

data is published by NKD

the data is crawled and published by https://github.com/alltheplaces/alltheplaces from public website(s)

ATP data and OSM data is the processed, validated and compared by https://codeberg.org/matkoniecz/list_how_openstreetmap_can_be_improved_with_alltheplaces_data

Processing includes and is not limited to

  • matching ATP and OSM POIs
  • skipping ATP POIs not matched well to any OSM POIs
  • skipping ATP POIs matched to multiple OSM POIs
  • skipping cases where matched ATP and OSM entries are conflicting on important aspects
  • skipping cases where OSM has specific website tags, but including cases where OSM website=* links main page instead of a specific POI

See

for tests that also document considered cases and behaviour.

Tagging Plans

website=* from ATP goes into website=*

Changeset Tags

Data Merge Workflow

Team Approach

I am doing this import myself but

Workflow

Import will be done by executing previously prepared script. Edits will be monitored and sample of edited objects checked in attempt to detect any previously missed problems and bugs.

Separate changeset for each POI

In case of bad, broken or otherwise problematic data such edit will be reverted. I have experience with reverting own automated edits - though it was not needed often. I will be using a separate account to make such cleanup easier, if it will be ever needed.

Edit will be done using Mateusz Konieczny - bot account - ATP import account

Conflation

Done with custom software residing at https://codeberg.org/matkoniecz/list_how_openstreetmap_can_be_improved_with_alltheplaces_data

This software is intended to be enabling processing of ATP shop-type POI data in general.

QA

Samples of data was inspected manually.

Data was also reviewed by variety of automated QA, see scripts in https://codeberg.org/matkoniecz/list_how_openstreetmap_can_be_improved_with_alltheplaces_data - see for example for just sample of problem found, reported back to All the Places project - in form of issues and/or patches

Discussion

The post to the community forum can be found at https://community.openstreetmap.org/t/proposed-imports-of-website-tags-for-nkd-based-on-atp-first-party-data/138589

Conflict of interest info

I received grant funding for making software that processed ATP data.

Time for making import itself was deliberately not included in grant to reduce conflict of interest, and I already received entire funding.

Not doing import at all will not block grant itself (again, setup this way to reduce conflict of interest).

I am not doing this import because funder requires me to do so, I rather obtained funding to make this kind of import possible.