OpenStreetMap logo OpenStreetMap

Post When Comment
OpenStreetMap NextGen Takes Shape! (screenshots)

I have gone ahead and investigated the issues today šŸ™‚. I removed the global LD_LIBRARY_PATH override, which is now only applicable to the Python process and nothing else. This should get rid of any potential conflicts you have experienced (you may need to reset your env: rm -r .venv). Regarding the StrEnum issue, this is most likely because FastAPI expects ā€œstr, Enumā€ type instead of the one that is currently used. I will fix it whenever I work on that portion of the code. Nonetheless, I would recommend waiting for the official documentation to avoid unforeseen issues and unnecessary stress — for both of us 😃.

OpenStreetMap NextGen Takes Shape! (screenshots)

Hey! I am currently focusing on frontend development, and I have not yet polished the points you mentioned. Rest assured, they will be fixed for the dev server launch. As mentioned, in the current state, the application is not yet ready for new contributors :-)

OpenStreetMap Service Availability (2023-12-20 - 2024-01-20)

Fixed the link, thanks!

OpenStreetMap Service Availability (2023-12-20 - 2024-01-20)

Oh okay! Thank you anyway :-) And thank you for just being nice :-)

OpenStreetMap Service Availability (2023-12-20 - 2024-01-20)

I also forgot to mention:

For the endpoint to be marked unavailable, two consecutive checks must fail.

So single connection drops are unlikely to be registered.

OpenStreetMap Service Availability (2023-12-20 - 2024-01-20)

@Andy Allan, Hey! I already thought of that, and there are additional connectivity checks to my other server in Poland. I exclude any downtime that is also present on that server. :-)

I also verify connectivity with non-OSM services (to prevent false positives)

By the way, do you by chance know anything about the official uptime OSM configuration? I have noticed it’s more optimistic, which can indicate a higher timeout limit (assuming 60 seconds?). I would love to see if it’s possible to reduce the timeout on official checks, as they seem to be not fully indicative of the average user experience (applications won’t usually wait for a response for a minute).

The OSM Iceberg

Btw, I tried to make mapping of bus relations a little bit more fun: https://github.com/Zaczero/osm-relatify šŸ˜‰

Dive into the HOT Tasking Manager codebase

Thank you for your insight. It’s really interesting to learn. Have a happy new year šŸ¦€!

OpenStreetMap Service Availability (2023-11-20 - 2023-12-20)

@mmd That’s interesting! Thank you for sharing. :-)

OpenStreetMap Service Availability (2023-11-20 - 2023-12-20)

That’s a good suggestion! However, I don’t think solving this mystery is my priority or of significant importance. I just personally found it interesting :-)

OpenStreetMap Service Availability (2023-11-20 - 2023-12-20)

@TrickyFoxy Thank You! I wasn’t aware of that. I quickly compared the downtime on 12-03 for ā€œWebsite HTTPSā€, with the official number being approximately 3 minutes and 30 seconds of downtime. However, during my observation, I noted about 27 minutes of unavailability. Perhaps, the official uptime checks are hitting some form of cache, or the downtime was more concentrated in the European region. The next month I will do a more in-depth comparison :-)

šŸŒ‚ The Past, The Present, The Future

@Firefishy

First of all, thank you for your valuable feedback. That’s quite a lot of information to analyze. Let me address it and provide you with my general thoughts about the current situation.

Transparency

First of all, it’s truly awesome to see OWG taking steps towards making information transparent and publicly available. It’s important that we don’t forget the core values of OSM.

Cloud comparison

I am willing to create the comparison, with one caveat. The data that I prepare would require supervision prior to its final publication. Given that this task isn’t something I do regularly, there’s a significant chance of errors. However, I think it would be more advantageous to first address the topics I’ll be presenting shortly before diving into the comparison. In my view, preparing the comparison is of lower priority, especially considering the current free availability of AWS for us.

Now that we’ve addressed that matter, I can shift my attention to expressing my thoughts and making specific recommendations.

1. Backup deduplication

I want to strongly emphasize the importance of prioritizing data duplication. The sooner we implement this, the more streamlined our storage maintenance will be, facilitating smoother transfers—whether to other vendors or onto self-owned hardware. This proactive approach will undoubtedly enhance the overall efficiency and management of our data storage system.

2. openstreetmap-storage-backups - 112.4 TB - $120/month

Backups including some historical. Backups are not de-duped by design (heavy admin / risk burden). Some opportunity to manually cleanup, but very low priority. No automatic cleanup.

I cannot provide specific commentary on this just yet. It would be helpful to have a more detailed breakdown first. For instance, what types of backups are being discussed, and what are their sizes? Perhaps you could direct me to a resource that would allow me to deduce this information on my own.

3. openstreetmap-planet - 71.1 TB - $100/month

Historical and current copies of published planet files. Deep-Archive, for future restore to AWS hosted planet service with full back catalog. No automatic cleanups.

I have analyzed https://planet.openstreetmap.org/planet/, but I couldn’t comprehend the need for 70TB. The most recent full-history dump weighs 200GB. Let’s assume we’ll store 5 of those, which would sum up to around 1TB. Historical dumps can be downloaded via torrent as they are currently, or reconstituted from the latest versions when unavailable. If we incorporate non-full-history dumps into the calculation, we can conservatively estimate the total storage requirement to be around 2TB.

4. openstreetmap-tile-aggregated-logs - 32.1 TB - $125/month

Archival of processed tile CDN usage logs. Historical reference for Ops to work out tends and usage patterns. More data here than provided by public logs: Index of /tile_logs 4 @pnorman can clarify.

Shouldn’t we employ a log sampling method? If the core idea is to analyze usage patterns and detect abuse, even sampling as much as 1% would significantly reduce our storage requirements.

5. openstreetmap-wal - 28.7 TB - $400/month

Live streaming ā€œWrite Ahead Logā€ copies of the OpenStreetMap core Postgres database. The WAL files are used for syncing follower instances of the core Postgres database server. Vital asset to our data recovery plans. Can be used for recovery between full weekly database backup or corruption. For clarity this database is private and not published via planet data (eg: messages, users etc). Automatic cleanup after 1 year.

If we are already conducting weekly database backups, retaining WAL files for a duration of 1 year seems wasteful in my opinion. Automatic cleanup should take place within 1 month at most, preferably within 2 weeks. Allocating 29TB of storage solely for WAL files appears wasteful to me.

6. openstreetmap-imagery-backups - 18.2 TB - $35/month

Backups of imagery provided to OpenStreetMap. Deep archival. Primarily backups of imagery hosted on kessie 3. No automatic cleanups.

I believe deduplication would be a perfect match in this scenario. Once imagery is added, it rarely changes, so any incremental backups would be of negligible size. Employing complete backups every time is not a suitable approach for addressing this kind of problem.

7. openstreetmap-fastly-logs - 5.3 TB - $125/month

Inbound fastly CDN logs for processing. Key to us finding and managing abuse, source for publish tile log analysis: Index of /tile_logs 4 Automatic Cleanup after 31 days.

The same argument as for 4 applies here.

8. openstreetmap-gps-traces - 2.8 TB - $80 to $225/month

The GPS traces that are uploaded to OpenStreetMap.org, the storage backend for website: Public GPS Traces OpenStreetMap 3 Formerly provided by NFS service, moved to S3 to simply admin burden and to seamlessly work across our hosting data centres. No automatic cleanup, but opportunity to improve costs with S3 ā€œtierā€ lifecycle rules.

I can observe that there are approximately 10,000,000 traces uploaded at the moment, averaging 0.3 MB per trace. When I download a trace from the website, I notice that it is uncompressed, which is consistent with the 0.3 MB estimate. Given that traces are essentially text files, applying basic compression can reduce their size by a factor of 20.

9. openstreetmap-fastly-processed-logs - 1.9 TB - $50/month

Archival of processed tile CDN view logs. Historical reference for Ops to work out tends and usage patterns. More data than provided by public logs: Index of /tile_logs 4 @pnorman can clarify.

The same argument as for 4 applies here

10. openstreetmap-user-avatars - 113.1 GB - $5/month

The user ā€œavatarā€ images as uploaded by users. No automatic cleanup, but opportunity to improve costs with S3 ā€œtierā€ lifecycle rules. Formerly provided by NFS service, moved to S3 to simply admin burden and to seamlessly work across our hosting data centres.

I don’t have any suggestions; everything appears to be fine.

11. openstreetmap-aws-cloudtrail - 76.0 GB - $2/month

Storage backend for AWS Cloudtrail API access logging service. Security monitoring. No automatic cleanup.

I don’t have any suggestions; everything appears to be fine.

12. openstreetmap-gps-images - 62.7 GB - $10/month

The processed display images used by OpenStreetMap.org on Public GPS Traces OpenStreetMap 4 Formerly provided by NFS service, moved to S3 to simply admin burden and to seamlessly work across our hosting data centres.

This also appears to be acceptable, but the cost seems somewhat excessive. I’m not sure about the reason for this.

13. openstreetmap-backups - 21.1 GB $0.03/month

Historical database backups from OSM in first few years. No automatic cleanup.

This is not worth discussing. I can only suggest deleting this data completely, as it appears to be entirely redundant and only complicates maintenance.

Summary

Taking into account the suggestions outlined in points 3, 4, 5, 7, 8, and 9, it becomes apparent that these proposed enhancements are relatively straightforward to integrate and would significantly enhance the efficiency of OSM operations. While I wholeheartedly endorse the concept of data deduplication, I will exclude it from my summary calculation for the sake of simplicity.

Based on my calculations, the current monthly expenditure of $100 + $125 + $400 + $125 + $225 + $50 equals $1025. However, by implementing the suggested changes, the projected monthly cost would reduce substantially to $3 + $1.25 + $16 + $1.25 + $10 + $0.5, amounting to $32. This would signify an impressive ~30x reduction in costs. I assume a linear scaling of costs here, which is not entirely accurate, but it’s the best I have been able to come up with given the limited information.

I have observed an emerging trend in published information where, rather than optimizing things, we are increasingly opting to pour more and more money into them (theoretically; it’s free now but I do not see it as an excuse). While I comprehend that volunteer work is often limited, I must highlight that the suggestions I have outlined are neither difficult nor time-consuming to implement.

I trust that you will view this feedback as constructive. Kindly take a moment to reflect upon it, and respond at your convenience.

-K

šŸŒ‚ The Past, The Present, The Future

@spatialia

I appreciate the clarification, and you’re absolutely right. Upon revisiting, I realized I confused the standard S3 Glacier rates with the S3 Deep Archive costs. The distinction, when properly examined, does align with the numbers mentioned for deep archive storage, which is roughly $1 per TB.

However, it’d be worthwhile to have a comprehensive comparison between various cloud services, especially considering other potential costs. For example, expenses like ā€œopenstreetmap-wal - 28.7 TB - $400/monthā€ still seem quite substantial.

Thanks for pointing it out and helping me see the bigger picture.

šŸŒ‚ The Past, The Present, The Future

@spatialia

That’s indeed a good observation. However, I did my due diligence by checking the S3 Glacier pricing at https://aws.amazon.com/s3/glacier/pricing/, and based on my findings, the cheapest rate I identified was approximately $3.5 per TB. This calculation doesn’t align with the numbers mentioned, so additional insights that could clarify this discrepancy, would be welcome.

šŸŒ‚ The Past, The Present, The Future

@MxxCon

I’m sorry to say, but the idea of Amazon being evil is not a personal belief; it’s a fact. Please conduct your own research, as my discussing it now is not feasible and would demand significant effort from my end. All I ask is for individuals making key commentary to provide justifications for their statements.

Here are some videos you may want to consider watching:

šŸŒ‚ The Past, The Present, The Future

@apm-wa

I understand how such an offering would be hard to refuse given the cost-saving implications for the OSM project. However, my concern is rooted in the timeline of events. From the data I gathered, OSM’s reliance on AWS dates back to at least 2022, whereas the free AWS credits began just ~6 months ago.

Furthermore, I’d like to express concerns about placing substantial reliance on a corporation like Amazon. Even if services are currently free, Amazon has a track record of making sudden and significant changes to its policies. To my knowledge, there’s no assurance that Amazon’s sponsorship will be perpetual, and transitioning away later could come with considerable costs and complexities. It’s free until the day it isn’t.

šŸŒ‚ The Past, The Present, The Future

@Friendly_Ghost

I see your viewpoint; allow me to clarify. Here’s a full sentence quote:

I’m not willing to continue the discussion in such an environment, where there’s an overemphasis on subjective views and a lack of grounded argumentation.

This sentence highlights that numerous individuals (not exclusively Firefishy) fail to present well-founded arguments alongside their comments. Continuing a conversation within such an environment proves challenging for me. I am open to engaging with individuals who offer substantiated facts rather than solely relying on their personal beliefs.

And about thanking to Firefishy, I would appreciate the opportunity to do so personally. However, I am still awaiting a response from him.

šŸŒ‚ The Past, The Present, The Future

@apm-wa

Thank you for the comprehensive overview and directing me to those resources. It’s clear from the 2021 OSM community survey and the Strategic Plan Outline that there was a strong consensus within the community to prioritize the stability of core infrastructure. This sheds more light on the overall situation, and I appreciate the effort to provide this context.

The strategic decision to prioritize infrastructure stability by opting for cloud services is understood and commendable, especially considering the rapid growth OSM has experienced. I can see that OSM’s choices were driven by its commitment to the project’s core values and its community.

However, one aspect still nudges at me: the choice of Amazon as the cloud service provider. With the wide selection of cloud service providers available, each with its own pricing models and philosophical underpinnings, why was Amazon — a corporation known for its controversial business practices — the chosen one? There are other providers like Backblaze and Hetzner, which, in my research, offer competitive, if not more affordable pricing, and do not have a reputation of commodifying its users to the extent Amazon does.

While the community’s overarching goals are clear, I think it’s crucial for the OSMF to consider not just the technical and financial aspects but also the ethical dimensions when partnering with third-party entities. It’s noteworthy to mention that Amazon’s reputation, especially concerning privacy and human rights, has reached an almost ā€œmemeā€ status in certain circles. Aligning with such an entity could potentially raise eyebrows, and it’s essential to be conscious of the broader implications of such associations.

šŸŒ‚ The Past, The Present, The Future

@Friendly_Ghost

Firefishy … only to hear you complaining about ā€œa lack of grounded argumentationā€ later on without even a ā€œthank youā€

Sorry, but what? This sounds like a false accusation, so please provide some evidence that I did indeed point out to Firefishy the lack of grounded argumentation in his detailed budget report. I believe that my only response was that I will not engage in further discussion for reasons mentioned.

šŸŒ‚ The Past, The Present, The Future

I appreciate the references you’ve provided. Focusing on issue #682, which you’ve specifically mentioned: ā€œGetting credits from AWSā€ doesn’t explicitly confirm a free sponsorship from AWS. The information I’ve gathered originally, already indicates financial investment of OSMF into Amazon, so the point about ā€œgetting creditsā€ brings no new information.

It’s important to differentiate between accessibility and public resource. Yes, anyone can create an account on Slack, but that doesn’t classify it as an open public resource. Just as signing up for a newsletter doesn’t make the contents of that newsletter universally accessible.

Transparency in a project like OSM is of most importance. I’ve voiced concerns about the project’s direction and transparency, not to attack OSM but to emphasize its importance.

Lastly, I’d appreciate it if you could ensure your arguments are well-considered before presenting them. Continually addressing ambiguities is becoming quite time-consuming for me.