Automated edits/Ben d8a2ae3a bot

From OpenStreetMap Wiki
Jump to navigation Jump to search

What

Change dead links from e.g. website=https://dead.link/ to disused:website=https://dead.link/.

Why

The OSM-data contain masses of dead links of various kinds:

This makes it look unreliable, outdated, and much less useful.

Let's fix that!

Note that simply *removing* those tags would be a bad idea, as an outdated website is an easily-detectable marker to find possibly-outdated nodes/amenities/ways/etc.

Where

Germany. I'm German, so I find it trivially easy to inspect and evaluate lots of German-speaking websites.

The code is open-source, so feel free to run this on your own country/area/whatever.

How

  • An extractor downloads and analyzes the German map, and extracts all URLs from a set of tags, like website, contact:website, contact:url, source:url, website:booking, internet, etc.
  • A crawler visits each domain once a month. By "domain" I use the interpretation "registered domain as defined by the Public Suffix List". A URL inside that domain is picked at random, with preference for URLs that have never been visited before.
  • If all requests to a domain fail for the exact same reason for 6+ months in a row, that domain is considered to be dead.
  • (proposed) Every now and then, pick a geohash area with sufficiently many dead links, and upload a changeset to add the disused: lifecycle prefix.

Consultation

The consultation process started on 2024-01-24 at [1].

Hence, the bot is not creating any changesets yet.

When it starts, I will start slowly, to give ample time for problems to be detected. In the beginning, I will probably double-check everything manually.

Result on 2024-02-02: No automated edits for now, let's try how well exporting to Osmose works in practice. (I expect this to take at least until 2025.)

Concerns mentioned in the AECoC

  • This bot is in the category “Useful edits that would be tedious to do manually, only after approval by the community and appropriate discussion.”
  • None of the “Problematic usage”s apply.
  • [2] TODO

Who

TODO

Logs

TODO