OpenStreetMap logo OpenStreetMap

Neural machine translation (NMT) is a method of machine translation that uses deep learning techniques to improve the accuracy of the translation. The success of ChatGPT already shows the great potential of generative AI and transformer-based language models. This diary will investigate the feasibility and performance of applying neural machine translation for OpenStreetMap, by fine tuning a pretrained translation model on OpenStreetMap data.

How to fine tune a pre-trained translation model on OSM data

I first found a pre-trained translation model in Hugging Face that translates from Chinese to English: https://huggingface.co/Helsinki-NLP/opus-mt-en-ro. This model is a MarianMT model, with 77 million parameters and ~300MB in disk size. So, it’s a small model. In comparison, GPT3 has 175 billion parameters.

Then, from OpenStreetMap, I collected all the existing Chinese-English translation pairs for any map objects located in Taiwan (as of 2023/01/31), and split them into training data and test data. I fined tuned the pre-trained translation model on this training data for five iterations. Finally, I evaluated the performance of the fine tuned model on the test data.

The code to fine tune the translation model is here: https://github.com/liyinxiao/neural-machine-translation-on-OpenStreetMap

Evaluations

After manual inspection on the first 200 rows of test data, the performance seems pretty good, and it performs especially well on ways. The details of the evaluation can be found in https://github.com/liyinxiao/neural-machine-translation-on-OpenStreetMap.

Conclusion

  • It needs only a few lines of code, to apply transfer learning and fine tune a pretrained translation model for OpenStreetMap use cases. This process does not require any prior knowledge about transformers.
  • The performance of fine tuned model is already similar to human translations, after 5 epochs of fine tuning on a pretrained model that’s only ~300MB on disk. This performance can be further improved.

Potential applications

  • OpenStreetMap editing assistant: when tags[‘name’] is entered, it can auto-generate its translations for manual review.
  • Use machine generated translation, without manual review.
Email icon Bluesky Icon Facebook Icon LinkedIn Icon Mastodon Icon Telegram Icon X Icon

Discussion

Log in to leave a comment