SIGCDMX import in Mexico City
SIGCDMX is a dataset which that is about building footprints and landuse covering Mexico City, Mexico. See more at https://www.openstreetmap.org/user/inserteunnombreaqui/diary/407242
Goals
The goal is to map building footprints and landuse. The data cannot be imported directly without heavy human supervision and review, so QA is vital.
Schedule
The Cuauhtémoc alcadia should be work on first by all members of the import process. Then, once a workflow is established, the areas around Cuauhtémoc will be worked on next and finally the entire Mexico City.
Import Data
Background
- Data source site: https://sig.cdmx.gob.mx/datos/descarga
- Data license: From https://www.seduvi.cdmx.gob.mx/servicios/servicio/ciudad_mx, which is in public domain as with any documents produced by CDMX. However, attribution is appreciated: Agencia Digital de Innovación Pública “Sistema Abierto de Información Geográfica (SIGCDMX)” Disponible en: https://sig.cdmx.gob.mx/ En el Sistema de Datos Abiertos vi que el dato tiene una licencia CC-BY, pero el sitio se cayó. En archive.org (https://web.archive.org/web/20220819024617/https://datos.cdmx.gob.mx/api/3/action/package_show?id=uso-de-suelo) se indica en el JSON: “license_id”: “cc-by”, “license_title”: “Creative Commons Attribution”, “license_url”: “https://web.archive.org/web/20220819024617/http://www.opendefinition.org/licenses/cc-by”.
- OSM attribution (if required): http://wiki.openstreetmap.org/wiki/Contributors#Mexico
- ODbL Compliance verified: yes
OSM Data Files
I download the data from (https://sig.cdmx.gob.mx/datos/descarga). There are two types of data there, the actual cadastry data “Descarga de datos del catastro” and landuse data “Descarga de datos SEDUVI”. The cadastry data is basically a bunch of points and I find that the only useful data there is the address data, which is fairly inaccurate.
It is better to download the landuse data since it has the polygons to work with. I download the shapefile from the “Descarga de datos SEDUVI” section. I’m working at Cuauhtémoc alcadia because it is the central alcadia of CDMX
Generally, since houses in Mexico are connected to each other, often the landuse is the building footprint itself. But, for more complex places like schools, churches, hospital, government areas, etc. it only covers the surrounding of the complex.
Import Type
This is a one-time import and requires human review at every step. The data will be imported to JOSM and final touches will be made with ID. Members are encouraged to work a few blocks at a time.
Data Preparation
Data Reduction & Simplification
I then use an application called QGIS to process the shapefile data, by open the .shp file. There are 4 files in that folder that are meant for different functions, fortunately QGIS is smart enough to parse them all. There are a couple of things that I need to do: - I changed the projection from default to EPSG:4326 (osm.wiki/Converting_to_WGS84#QGIS) - Using the Processing toolbox, I use k-means clustering and split vector layer tools to split the alcadia data to 30 files, which is an arbitrary number. This make the files much easier to manage and review in JOSM. - I export it to geojson format, which is readable by JOSM and OSM.
Tagging Plans
see workflow
Changeset Tags
Changesets will be tagged with source=SIGCDMX
Data Merge Workflow
Team Approach
This is a team effort of the CDMX community.
Workflow
I estimate that each geojson file has 1000 buildings.
- I open one geojson file to JOSM and then on another layer, I download the OSM data around the area. We now have two layers, our SIGCDMX data and OSM data. This is the hardest step since I need to figure out how to make the SIGCDMX data live peacefully with OSM data. This is how I do it, in order:
- Use Ctrl+F, search “type:way” to select all ways. Then I remove all the junk tags and add “building=yes”, since 90% of the polygons reflect the actual building footprints in Mexico because all the houses are connected.
- Simplify the areas with a JOSM plugin named “SimplifyArea”. With all ways selected, I click a button that activate the plugin, and a lot of useless nodes are removed! Then, I go to the Validation tool, click verify, then it would spit out an error of “duplicated nodes”. I then press the wrench to auto-fix it, and it works like a charm!
- For places with existing OSM data, I either spend my time adjusting the data node-by-node, occasionally splitting the data or such. Or, I use “Ctrl+C” and “Ctrl+Shift+V” to copy the tags from the old OSM shapes to the new geometries, then delete the old ways. Sometimes I saw nodes with the address, to which I usually merge them to the building or remove them.
- Reclassify landuses for the remaining 10%. A lot of them are basically “landuse=construction” (people building something), “landuse=brownfield” (dirt left unused), or “amenity=parking” (mexicans park their cars on empty spots). But some of them are also schools, commercial areas, industry, parks, etc. During this time I also be mindful to align the roads to the new buildings, or the service roads to the parking lots.
- We still have two layers: SIGCDMX data and OSM data. After doing most of the processing work, I merge the SIGCDMX layer to OSM layer, by right click on it, merge, and click through all the warnings since we have done the processing work.
- I then continue to look for problematic areas, sometimes drawing new buildings or edit the geometries. Afterwards, I use the verification tool to look for common errors such as “highways overlap buildings” or “building inside buildings” or “building overlaps buildings” or “self-intersect ways”, etc. You get the idea.
- Before actually uploading, I then press the upload button to check for any additional errors, and scan the area for any additional adjustments. I often use the satellite layer for this purpose. Finally, I upload the data!
Conflation
see above
QA
The next step is to redraw building geometries to match with the satellite data and doing more verifications to check the OSM data is accurate. I then use this page (https://seduvi.cdmx.gob.mx/programas-delegacionales-de-desarrollo-urbano) to map actual landuse data (residential, commercial, retail, etc.) following the example of Quixlian in the Zona Rosa/Colonia Juaréz. As for adding addresses, currently I only use Maxillary and street view images, but I think that in the future we could merge the cadastre data with the building geometries for addresses, but I don’t know how to do that yet and I’m not sure whether the city government did a good job or not.
The post to the community forum was sent on YYYY-MM-DD and can be found here