User:JamesBadger/Osm2pgsql/benchmarks
Importing the OSM data set using osm2pgsql. See Osm2pgsql/benchmarks.
Cloud Instance Import
Configuration
OpenStack IceHouse instance.
General settings
| Setting | Value |
|---|---|
| Created | Aug 27 11:27 MDT |
| Flavor | m1.xxlarge |
| RAM | 32GB |
| VCPUs | 8 |
| Disk | 20GB |
| OS | Ubuntu Server 14.04.1 LTS 64-bit |
| Postgres Version | 9.3.5 |
| osm2pgsql version | 901a996 (May 28 2014) |
| Volumes | 32 GB at /planet, 200 GB at /work
|
| Planetfile | PBF - 2014/08/20 |
OpenStack Glance volumes are mounted with ZFS on Linux. Default ZFS compression is enabled for /work. ZFS recordsize=8k and primarycache=metadata for the work/postgresql filesystem. PostgreSQL has been configured to use that as its data directory.
ZFS Pools and Filesystems
32 GB pool 'planet'
| Name | Used by Dataset | Available | Compression Ratio | Mountpoint |
|---|---|---|---|---|
| planet | 25.5G | 5.73G | 1.00x | /planet |
200 GB pool 'work'
| Name | Used by Dataset | Available | Compression Ratio | Mountpoint | Properties |
|---|---|---|---|---|---|
| work | 32K | 196G | 1.97x | /work | compression=lz4, atime=off |
| work/log | 31K | 196G | 1.00x | /work/log | inherit |
| work/log/postgresql | 30K | 196G | 1.00x | /work/log/postgresql | inherit |
| work/postgresql | 41.8M | 196G | 2.76x | /work/postgresql | recordsize=8k, primarycache=metadata |
PostgreSQL configuration
| Setting | Value |
|---|---|
| shared_buffers | 8MB |
| temp_buffers | 64MB |
| work_mem | 1MB |
| maintenance_work_mem | 4096MB |
| effective_io_concurrency | 1 |
| fsync | off |
| synchronous_commit | off |
| full_page_writes | off |
| checkpoint_segments | 20 |
| checkpoint_completion_target | 0.9 |
| random_page_cost | 3.0 |
| effective_cache_size | 18GB |
| log_destination | csvlog |
| logging_collector | on |
| log_directory | '/work/log/postgresql' |
| log_filename | 'postgresql-%Y-%m-%d_%H%M%S.log' |
| log_rotation_age | 1d |
| log_rotation_size | 1GB |
| log_min_duration_statement | 250ms |
| log_checkpoints | on |
| log_connections | on |
| log_disconnections | on |
| log_duration | on |
| log_line_prefix | '%t [%p]:[%l] user=%u, db=%d ' |
| log_lock_waits | on |
| log_temp_files | 0 |
| autovacuum | off |
| autovacuum_vacuum_scale_factor | 0.04 |
| autovacuum_analyze_scale_factor | 0.02 |
Run
$ sudo -u postgres -i
postgres$ time osm2pgsql --slim -d gis -C 20000 --flat-nodes /work/nodes.cache --number-processes 6 /planet/planet-140827.osm.pbf
osm2pgsql SVN version 0.85.0 (64bit id space)
Using projection SRS 900913 (Spherical Mercator)
Setting up table: planet_osm_point
NOTICE: table "planet_osm_point" does not exist, skipping
NOTICE: table "planet_osm_point_tmp" does not exist, skipping
Setting up table: planet_osm_line
NOTICE: table "planet_osm_line" does not exist, skipping
NOTICE: table "planet_osm_line_tmp" does not exist, skipping
Setting up table: planet_osm_polygon
NOTICE: table "planet_osm_polygon" does not exist, skipping
NOTICE: table "planet_osm_polygon_tmp" does not exist, skipping
Setting up table: planet_osm_roads
NOTICE: table "planet_osm_roads" does not exist, skipping
NOTICE: table "planet_osm_roads_tmp" does not exist, skipping
Using built-in tag processing pipeline
Allocating memory for dense node cache
Allocating dense node cache in one big chunk
Allocating memory for sparse node cache
Sharing dense sparse
Node-cache: cache=20000MB, maxblocks=2560000*8192, allocation method=11
Mid: loading persistent node cache from /work/nodes.cache
Allocated space for persistent node cache file
Maximum node in persistent node cache: 0
Mid: pgsql, scale=100 cache=20000
Setting up table: planet_osm_nodes
NOTICE: table "planet_osm_nodes" does not exist, skipping
Setting up table: planet_osm_ways
NOTICE: table "planet_osm_ways" does not exist, skipping
Setting up table: planet_osm_rels
NOTICE: table "planet_osm_rels" does not exist, skipping
Reading in file: /planet/planet-140827.osm.pbf
Processing: Node(2496510k 498.2k/s) Way(249511k 10.06k/s) Relation(708550 31.34/s)
Standard exception processing way_id 1382119: TopologyException: side location conflict at 1189651.77 8876718.8800000008
Processing: Node(2496510k 498.2k/s) Way(249511k 10.06k/s) Relation(2442080 33.79/s)
Standard exception processing way_id 3631463: TopologyException: side location conflict at 1906862.01 6654069.9800000004
Processing: Node(2496510k 498.2k/s) Way(249511k 10.06k/s) Relation(2552370 34.24/s)
Standard exception processing way_id 3757788: TopologyException: side location conflict at 1322274.0800000001 5383303.8399999999
Processing: Node(2496510k 498.2k/s) Way(249511k 10.06k/s) Relation(2672440 34.37/s)
Standard exception processing way_id 3890452: TopologyException: side location conflict at 1038877.48 5779728.6200000001
Processing: Node(2496510k 498.2k/s) Way(249511k 10.06k/s) Relation(2765210 34.59/s) parse time: 109760s
Node stats: total(2496510292), max(3044821862) in 5011s
Way stats: total(249511922), max(300402050) in 24800s
Relation stats: total(2765219), max(3994576) in 79949s
Committing transaction for planet_osm_point
Committing transaction for planet_osm_line
Committing transaction for planet_osm_polygon
Committing transaction for planet_osm_roads
Going over pending ways...
Maximum node in persistent node cache: 3045064703
160218658 ways are pending
Using 6 helper-processes
Mid: loading persistent node cache from /work/nodes.cache
Maximum node in persistent node cache: 3045064703
Mid: loading persistent node cache from /work/nodes.cache
Maximum node in persistent node cache: 3045064703
Mid: loading persistent node cache from /work/nodes.cache
Maximum node in persistent node cache: 3045064703
Mid: loading persistent node cache from /work/nodes.cache
Maximum node in persistent node cache: 3045064703
Mid: loading persistent node cache from /work/nodes.cache
Maximum node in persistent node cache: 3045064703
Mid: loading persistent node cache from /work/nodes.cache
Maximum node in persistent node cache: 3045064703
Helper process 4 out of 6 initialised
Helper process 5 out of 6 initialised
Helper process 1 out of 6 initialised
Helper process 2 out of 6 initialised
Helper process 3 out of 6 initialised
Helper process 0 out of 6 initialised
Process 5 finished processing 26703109 ways in 76486 sec
Process 0 finished processing 26703110 ways in 76486 sec
Process 4 finished processing 26703109 ways in 76486 sec
Process 1 finished processing 26703110 ways in 76486 sec
Process 2 finished processing 26703110 ways in 76486 sec
Process 3 finished processing 26703110 ways in 76486 sec
Maximum node in persistent node cache: 3045064703
Maximum node in persistent node cache: 3045064703
Maximum node in persistent node cache: 3045064703
Maximum node in persistent node cache: 3045064703
Maximum node in persistent node cache: 3045064703
All child processes exited
160218658 Pending ways took 76491s at a rate of 2094.61/s
Committing transaction for planet_osm_point
Committing transaction for planet_osm_line
Committing transaction for planet_osm_polygon
Committing transaction for planet_osm_roads
Going over pending relations...
Maximum node in persistent node cache: 3045064703
0 relations are pending
Using 6 helper-processes
Mid: loading persistent node cache from /work/nodes.cache
Maximum node in persistent node cache: 3045064703
Mid: loading persistent node cache from /work/nodes.cache
Maximum node in persistent node cache: 3045064703
Mid: loading persistent node cache from /work/nodes.cache
Maximum node in persistent node cache: 3045064703
Mid: loading persistent node cache from /work/nodes.cache
Maximum node in persistent node cache: 3045064703
Mid: loading persistent node cache from /work/nodes.cache
Maximum node in persistent node cache: 3045064703
Mid: loading persistent node cache from /work/nodes.cache
Maximum node in persistent node cache: 3045064703
Process 4 finished processing 0 relations in 13 sec
Maximum node in persistent node cache: 3045064703
Process 5 finished processing 0 relations in 14 sec
Maximum node in persistent node cache: 3045064703
Process 3 finished processing 0 relations in 14 sec
Maximum node in persistent node cache: 3045064703
Process 0 finished processing 0 relations in 14 sec
Process 1 finished processing 0 relations in 14 sec
Maximum node in persistent node cache: 3045064703
Process 2 finished processing 0 relations in 14 sec
Maximum node in persistent node cache: 3045064703
All child processes exited
0 Pending relations took 15s at a rate of 0.00/s
node cache: stored: 2343195456(93.86%), storage efficiency: 89.39% (dense blocks: 2387255, sparse nodes: 88445625), hit rate: 92.85%
Sorting data and creating indexes for planet_osm_line
Sorting data and creating indexes for planet_osm_point
Sorting data and creating indexes for planet_osm_polygon
Sorting data and creating indexes for planet_osm_roads
Maximum node in persistent node cache: 3045064703
Stopping table: planet_osm_nodes
Stopped table: planet_osm_nodes in 0s
Stopping table: planet_osm_rels
Stopping table: planet_osm_ways
Building index on table: planet_osm_rels (fastupdate=off)
Building index on table: planet_osm_ways (fastupdate=off)
Analyzing planet_osm_point finished
Analyzing planet_osm_roads finished
Analyzing planet_osm_polygon finished
Analyzing planet_osm_line finished
Stopped table: planet_osm_rels in 2256s
Copying planet_osm_point to cluster by geometry finished
Creating geometry index on planet_osm_point
Copying planet_osm_roads to cluster by geometry finished
Creating geometry index on planet_osm_roads
Creating osm_id index on planet_osm_roads
Creating osm_id index on planet_osm_point
Creating indexes on planet_osm_roads finished
All indexes on planet_osm_roads created in 31410s
Completed planet_osm_roads
Creating indexes on planet_osm_point finished
All indexes on planet_osm_point created in 35766s
Completed planet_osm_point
Copying planet_osm_line to cluster by geometry finished
Creating geometry index on planet_osm_line
Creating osm_id index on planet_osm_line
Creating indexes on planet_osm_line finished
All indexes on planet_osm_line created in 157911s
Completed planet_osm_line
Copying planet_osm_polygon to cluster by geometry finished
Creating geometry index on planet_osm_polygon
Creating osm_id index on planet_osm_polygon
Creating indexes on planet_osm_polygon finished
All indexes on planet_osm_polygon created in 232064s
Completed planet_osm_polygon
Stopped table: planet_osm_ways in 265753s
Osm2pgsql took 452331s overall
real 7538m51.173s
user 2426m23.276s
sys 262m43.906s
7538 minutes is 5 days, 5 hours, and 38 minutes.
I estimate the bottleneck was disk I/O, which is network based on this OpenStack cloud.
Post-Run Data Usage
These charts were made using PNP4Nagios and NCSA scripts running on the host to push data to a remote Nagios instance. NCSA scripts were run every 10-15 minutes. Note that the "Free Space" charts are inverted and actually display space usage on the filesystem.
Post-Run Info
This server was terminated after being unable to properly vacuum the database. A "VACUUM ANALYZE VERBOSE" command would not finish, even after taking a day to run. This is likely due to the extremely limited disk IO.
Intel Server Import
Configuration
Circa-2010 Intel Server.
General settings
| Setting | Value |
|---|---|
| RAM | 24 GB DDR3 SDRAM |
| Motherboard | Intel S5520SCR |
| Chassis | Intel SC5600BRP |
| CPUs | 1 x Xeon E5530 2.4GHz |
| Cores | 8 |
| RAID Card | Intel SRCSASBB8I |
| Disk | 2x300 GB 15000 RPM Disks in Hardware RAID 1 (SAS)
4x2000 GB 5400-7200 RPM WD RED Disk as individual RAID 0 drives (SATA), then used to build a RAID 10 ZFS Pool |
| OS | Ubuntu Server 14.04.1 LTS 64-bit |
| Postgres Version | 9.3.5 |
| osm2pgsql version | 901a996 (May 28 2014) |
| Planetfile | PBF - 2014/09/03 |
PostgreSQL data directory at /var/lib/postgresql on RAID 1. PostgreSQL logs at /work/log/postgresql on ZFS Pool. Node cache for osm2pgsql at /opt/osm/nodes.cache on RAID 1. Planet file at /work/planet/planet-140903.osm.pbf on ZFS Pool. Compression enabled on ZFS Pool.
PostgreSQL configuration
| Setting | Value |
|---|---|
| shared_buffers | 8MB |
| temp_buffers | 64MB |
| work_mem | 48MB |
| maintenance_work_mem | 4096MB |
| effective_io_concurrency | 2 |
| fsync | off |
| synchronous_commit | off |
| full_page_writes | off |
| checkpoint_segments | 20 |
| checkpoint_completion_target | 0.9 |
| effective_cache_size | 16GB |
| autovacuum | on |
| autovacuum_vacuum_scale_factor | 0.04 |
| autovacuum_analyze_scale_factor | 0.02 |
Autovacuum is enabled because I do not want to have to run a manual vacuum afterwards that will take who know how long.
Run
Note that only data above 45˚ N is imported on this run; that is the area of interest for our implementation. Hstore is also enabled.
$ sudo -u postgres -i $ service postgresql start $ createuser osm $ createdb -E UTF8 -O osm gis $ psql -f /usr/share/postgresql/9.3/contrib/postgis-2.1/postgis.sql -d gis $ psql -d gis -c "ALTER TABLE geometry_columns OWNER TO osm; ALTER TABLE spatial_ref_sys OWNER TO osm;" $ psql -d gis -c "CREATE EXTENSION hstore;" $ psql -f /usr/share/osm2pgsql/osm2pgsql/900913.sql -d gis $ time osm2pgsql --slim -d gis -C 16000 --number-processes 8 --bbox -180,45,180,90 --hstore --flat-nodes /opt/osm/nodes.cache /work/planet/planet-140903.osm.pbf
Post-Run Data Usage
TDB
Data Update Statistics
Info on updating the DB to newer versions using changesets. TBD.
