dekstop's Comments
| Post | When | Comment |
|---|---|---|
| Distribution of locales (languages) among HOT tasking manager contributors | Nice! And yes, well spotted – that must be a side-effect of how I structured my SQL query for the “combined” tab. Just had a look but I’m not entirely sure what’s going on there. I just managed to get three different counts for three different kinds of summation queries. OSM metadata is notoriously messy, and it’s late in the day… maybe an issue for another day. |
|
| Distribution of locales (languages) among HOT tasking manager contributors | But your point is well made. English use may be over-inflated in these stats because a fair amount of people might prefer to use that over other (native) languages. And as usual, there’ll be all kinds of other reasons to choose particular language settings – including incomplete translations. |
|
| Distribution of locales (languages) among HOT tasking manager contributors | Indeed, as mentioned above this merely tracks the locale configured per editor – it would be much harder to establish what languages people actually speak :) |
|
| Distribution of locales (languages) among HOT tasking manager contributors | That’s a great idea! I’ve given you edit permissions to the spreadsheet. |
|
| Quantifying HOT participation inequality: it's complicated. | Many thanks for the kind words Ralph :) |
|
| Quantifying HOT participation inequality: it's complicated. | It’s a well-established term in social sciences: https://en.wikipedia.org/wiki/Participation_inequality |
|
| Quantifying HOT participation inequality: it's complicated. | Very nice! Yes I don’t show most of the tail in my distribution charts, that would have made it quite unreadable… |
|
| Quantifying HOT participation inequality: it's complicated. | Haha thank you so much jonwit – that comment made my day! |
|
| Quantifying HOT participation inequality: it's complicated. | vtcraghead – thanks for that link, it’s always useful to have comparative data sets. Pascal gives some details about his process here: http://neis-one.org/2013/11/typhoon-haiyan-osm-response/ It looks like he’s looking at a much larger activity area, namely: “the Philippines and some parts of the Vietnam coast. This extent is illustrated by a black line on the website.” In comparison I’m only looking at project bounds within the tasking manager of any project that has “Typhoon Haiyan” in the title – 27 of them, all in the project ID range between #338 and #392. Furthermore he’s looking at a longer time period than I am: many of these projects stopped receiving contributions via the tasking manager before year’s end, whereas his contributor number looks at the full period up to 1 Jan 2014. osm.wiki/Typhoon_Haiyan has some details about the many mapping activities that took place outside of the tasking manager. In short, he’s looking at a much wider range of activity whereas I’m only looking at contributions that could plausibly be associated with the HOT tasking manager. However the boundary between the two however are sometimes blurry, and unfortunately there’s no solid method for identifying which specific edits are attributable to TM contributions… I aim to find heuristics that work across most TM activities, and I’ve been updating my methods as I learned more about the nuances, however the extreme case of Haiyan illustrates that there can still be loads of additional activity in the context of the same disaster event that I’m not considering. In some cases this is intentional – e.g. for my analyses I explicitly do not want to include cleanup work that is organised outside of the tasking manager. When I find some spare time I’ll have a look at the Haiyan contribution timeline to check whether I should update my heuristics – it certainly looks like I might, this is a big margin. At minimum I should state this as a further disclaimer: there may be many activities around particular disaster events that I’m not considering because they cannot be attributed to tasking manager projects. Thanks for posting this! Stats are hard. (As an aside – iirc Haiyan was way before changeset tags were widely used to annotate HOT initiatives, that only really became practical in mid-2014 when iD started supporting it.) |
|
| Quantifying HOT participation inequality: it's complicated. | (After a long IRC conversation in the #osm channel with SK53 and others.) At some point I’ll write a dedicated post about how I produce the data, people ask about it quite regularly. This should then also clearly state all the aims and known caveats of my methods – atm this is too dispersed over too many places, and whenever I write a new post I tend to forget to mention some of the many necessary disclaimers, which means sometimes people misunderstand my intentions. I’ll put down a few key points here so I can then later extract them as a dedicated post… I’m posting these explorations on my diary to share my findings, but also to stimulate debate – about the HOT community, but also about the methods I use to describe the community. I’m always grateful when people are engaging with my posts and giving specific feedback, because such exchanges tend to involve learning experiences. Sometimes we can clarify misunderstandings, or people might identify gaps in my methodology, and sometimes we just find that people might interpret the same thing quite differently. As long as everyone stays constructive all these things are fair game, and I embrace being challenged. I love learning new things! My main motivation to do any of this is to better understand how volunteers engage with HOT, and by HOT I mean “mapping projects using the tasking manager”. I’m not looking at map quality, or impact on the map, but instead at community engagement and volunteer effort. The process of how I derive “labour hours” is broadly described here, and in some other places all over my diary: @dekstop/diary/35271#comment31077 The paper that first introduced the methodology (by Stuart Geiger and Aaron Halfaker) is concerned with contributor engagement in Wikipedia: http://www.stuartgeiger.com/cscw-sessions.pdf My source data are the edits in the OSM edit history: any creation or version increase of a node, way, or relation, along with their timestamps. I cross-reference these with TM2 project bounds and activity periods. You do not need to be listed as a HOT project contributor to be considered for my analyses. I’ve found in informal checks that JOSM tends to preserve edit timestamps even when you submit edits in large chunks, however I do not sufficiently understand the mechanics to be certain. Validation work is unfortunately not captured well by my statistics, because I have no visibility over their process and can only see actual map edits. This is important to acknowledge because in some HOT initiatives, more experienced contributors are encouraged to become validators. Depending on their process this might mean that they stop contributing to the map, and won’t show up in the OSM edit history – which mean they won’t show up in my stats. (On IRC SK53 says that his validation work for Missing Maps involved lots of map edits – “lots of crooked buildings”, and hence my stats probably captured his effort well; this will not always be the case.) And finally – I still hope that someone actually takes my data set and does something entirely different with it :) |
|
| Quantifying HOT participation inequality: it's complicated. | Simon, I would love it if you took the data I linked above and published an “unbiased” version! |
|
| Missing Maps: the first year in stats & charts | JIBEC asked via DM how I produced these visualisations. I’ve written a little bit about the data side here: The charts were mostly produced in Python with Matplotlib, using custom visualisation methods that I wrote myself. E.g. the contributor flow plot was a bank holiday project, it took a couple of days to get it just right… The world map was done in QGIS, and the survival plots were made with the Python lifelines library: http://lifelines.readthedocs.org/en/latest/ I’m making heavy use of IPython with inline Matplotlib charts for data exploration, they essentially replaced most other tools I might have used in the past. Plus it means I don’t have to use R! (No offence to R users, it simply doesn’t fit my mental model at all.) |
|
| MissingMaps Powwow | It was an incredible experience. It’s such a diverse team, and everyone’s a pleasure to work with. An intense but very stimulating three days! |
|
| How many HOT contributors never complete their first task? | Hallo rayKiddy! I’m getting the data from the OpenStreetMap edit history [1], that’s a ~50GB compressed XML or PBF file… it takes a substantial amount of work to get data out, but since that’s part of my job [2] I have plenty of experience with that and do it on a regular basis :) The short answer: custom Osmium parsers, a PostgreSQL database, and an import process that matches OSM edits with HOT projects based on information taken from the tasking manager. I intend to write more about my process in a future post, but it’ll take a while to get there. Partially the problem is that there are no general-purpose tools for it, so it involves a range amount of different technologies, and the process changes depending on your needs… |
|
| How many HOT contributors never complete their first task? | Thanks both for your comments! It’s interesting to hear the many stories behind this, here and on the HOT mailing list. In some cases a lot of consideration goes into the decision not to mark a task as done… and sometimes people simply run out of time before they can finish. Here’s a particularly detailed comment by Jarmo Kivekäs on the HOT list: https://lists.openstreetmap.org/pipermail/hot/2015-August/009938.html |
|
| Unknown Pleasures | Haha I guess I’ll prepare a merch table for the next SotM! |
|
| Initial activity and retention of first-time HOT contributors | PierZen – I’m certainly not suggesting that someone who stops contributing to HOT is “lost” for OSM :) In this post I don’t look at OSM retention at all, as I mention above. My aim was to look at HOT participation in isolation, which to me means anything published on the tasking manager, including Missing Maps and a growing number of other projects that are not about disaster response. I also don’t think that any of the activities on OSM are in competition: currently it feels like there is an abundance of potential contributors, and I don’t anticipate an end to this anytime soon. And yes, many OSMers have found other things to keep them busy than just contributing to HOT :) I fairly consistently found that OSM experts tend to have lower average HOT retention than OSM newcomers. Most likely they tend to join briefly for key initiatives, but then continue with other OSM work. joost schouppe: Across all HOT contributions I’ve looked at (including earlier periods), about 10% of users submitted only one changeset. However even in those cases, some of the changesets have individual timestamps for distinct edits. The aggregate effect is that only 3% of the changesets I’m looking at have a recorded duration of 0. Based on these numbers I agree that we’re probably under-counting the actual duration of people’s sessions, however I think the effect is relatively mild. Particularly since what we care about is not greatest accuracy, but merely the ability to compare approximate effort across different initiatives. Regarding your last point about highly prolific experts: I expect (and I’m sure you know this much better than I) that there are different ways of speeding up your contribution rate by knowing your tools better, so any differences in rate may actually simply be the result of different tool use. And yes, in many projects across OSM there tend to be a small number of people who just never seem to stop mapping… :) |
|
| Initial activity and retention of first-time HOT contributors | Thanks Dan! Yes I agree that there are likely different audiences involved; unfortunately that is hard to establish without actually interviewing people. Vincent de Phily – thanks! There are a number of ways in which we could look at local HOT mapper contributions, that’s certainly on my list of things to look at. As far as I’ve seen that is likely to be a much smaller number of people though, at least at the moment, so it might be more tricky to find generalisable observations. And of course they’re hard to identify. Identifying based on contributed “local” knowledge is an interesting thought, although that can also happen remotely via field papers. For a project last year I tried to identify locals by their prior edit history (people who predominantly had local OSM contributions prior to the activation), and found that people with such profiles are exceedingly rare – I think local HOT contributors tend to be new to OSM. In some cases we might be able to identify them based on participation in training events etc. |
|
| Initial activity and retention of first-time HOT contributors | Thanks mataharimhairi! PlaneMad – it’s all done “by hand”, I’m intending to write a little about that in a future post. I’m using Osmium to extract data from the OSM edit history, data is stored in a Postgres DB, analysis happens in Python (mostly using IPython and various analysis/visualisation libraries.) Went through loads of iterations to find a combination of tools that work well for me, e.g. in earlier work I used Pig on EC2 but that was just too much hassle. No Nepal data yet because I finished my analyses just around the time that happened. PierZen – I’m not looking at OSM activity outside of HOT contributions at all, this is entirely about HOT activity and HOT retention. And yes I agree, there’s much more we could look at :) Including how HOT vs other OSM contribution patterns relate – e.g. do HOT contributors turn into OSM contributors? |
|
| Initial activity and retention of first-time HOT contributors | dkunce – yes, should be most interesting to track retention over time as Missing Maps matures. Will definitely keep an eye on it. I’d also like to compare a much larger set of other initiatives, if I can find the time. jonwitt – many thanks! I grouped individual edits by their timestamp, with a session timeout of one hour. (I additionally estimate the duration of the first edit since we don’t know when people actually started. That estimate is simply based on the average time between edits.) I like “labour hours” as a measure of engagement because it’s much more directly related to a contributor’s effort than the edit counts that are usual produced for such studies. Additionally it is a measure that also makes sense to organisers; e.g. mapathons can be thought of in terms of labour hours. The first time I’ve seen this measure used was in a Wikipedia paper from Geiger/Halfaker in 2013. SimonPoole – Good points! I realise I have much more exposure to the experience of HOT newcomers, through attending mapathons and other community gatherings over the last year. As a result most of my conversations tended to be with people who come from outside the OSM community. Your interpretation certainly sounds plausible as well. Would be curious about what else you think comes across as biased! I do have a particular perspective, but I also try to make it explicit in my writing when I speculate. The contributor split is:
|