Off the back of my recent post about privileges I enjoy as a result of my location and first language, even at my highly-multinational employer, and inspired by my colleague Atanas‘ data-mining into where Automatticians are
located, I decided to do another treemap, this time about which countries Automatticians call home:
Where are the Automatticians?
If raw data’s your thing (or if you’re just struggling to make out the names of the countries with fewer Automatticians), here’s
a CSV file for you.
To get a better picture of that, let’s plot a couple of cartograms. This animation cycles between showing countries at (a) their
actual (landmass) size and (b) approximately proportional to the number of Automatticians based in each country:
This animation alternates between showing countries at “actual size” and proportional to the number of Automatticians based there. North America and Europe dominate the map, but there
are other quirks too: look at e.g. how South Africa, New Zealand and India balloon.
Another way to consider the data would be be comparing (a) the population of each country to (b) the number of Automatticians there. Let’s try that:
Here we see countries proportional to their relative population change shape to show number of Automatticians, as seen before. Notice how countries with larger populations like China
shrink away to nothing while those with comparatively lower population density like Australia blow up.
There’s definitely something to learn from these maps about the cultural impact of our employee diversity, but I can’t say more about that right now… primarily because I’m not smart
enough, but also at least in part because I’ve watched the map animations for too long and made myself seasick.
A note on methodology
A few quick notes on methodology, for the nerds out there who’ll want to argue with me:
Country data was extracted directly from Automattic’s internal staff directory today and is based on self-declaration by employees (this is relevant because we employ a relatively
high number of “digital nomads”, some of whom might not consider any one country their home).
Countries were mapped to continents using this dataset.
Maps are scaled using Robinson projection. Take your arguments about this over here.
The treemaps were made using Excel. The cartographs were produced based on work by Gastner MT, Seguy V, More P. [Fast flow-based algorithm for creating density-equalizing map
projections. Proc Natl Acad Sci USA 115(10):E2156–E2164 (2018)].
Some countries have multiple names or varied name spellings and I tried to detect these and line-up the data right but apologies if I made a mess of it and missed yours.
Take a look at the map below. I’m the pink pin here in Oxfordshire. The green pins are my immediate team – the people I work with on a
day-to-day basis – and the blue pins are people outside of my immediate team but in its parent team (Automattic’s org chart is a bit like a fractal).
I’m the pink pin; my immediate team are the green pins. People elsewhere in our parent team are the blue pins. Some pins represent multiple people.
Thinking about timezones, there are two big benefits to being where I am:
I’m in the median timezone, which makes times that are suitable-for-everybody pretty convenient for me (I have a lot of lunchtime/early-afternoon meetings where I get to
watch the sun rise and set, simultaneously, through my teammates’ windows).
I’m West of the mean timezone, which means that most of my immediate coworkers start their day before me so I’m unlikely to start my day blocked by anything I’m waiting on.
(Of course, this privilege is in itself a side-effect of living close to the meridian, whose arbitrary location owes a lot to British naval and political clout in the 19th century: had
France and Latin American countries gotten their way the prime median would have probably cut through the Atlantic or Pacific oceans.)
2. Language Privilege
English is Automattic’s first language (followed perhaps by PHP and Javascript!), not one of the 120 other languages spoken
by Automatticians. That’s somewhat a consequence of the first language of its founders and the language in which the keywords of most programming languages occur.
It’s also a side-effect of how widely English is spoken, which in comes from (a) British colonialism and (b) the USA using
Hollywood etc. to try to score a cultural victory.
Languages self-reportedly spoken by Automatticians, sized proportional to the number of speakers. No interpretation/filtering has been done, so you’ll see multiple dialects of the
same root language.
I’ve long been a fan of the concept of an international axillary language but I appreciate that’s an idealistic dream whose war
has probably already been lost.
For now, then, I benefit from being able to think, speak, and write in my first language all day, every day, and not have the experience of e.g. my two Indonesian colleagues who
routinely speak English to one another rather than their shared tongue, just for the benefit of the rest of us in the room!
3. Passport Privilege
Despite the efforts of my government these last few years to isolate us from the world stage, a British passport holds an incredible amount of power, ranking fifth or sixth in the world depending on whose passport index you
follow. Compared to many of my colleagues, I can enjoy visa-free and/or low-effort travel to a wider diversity of destinations.
Normally I might show you a map here, but everything’s a bit screwed by COVID-19, which still bars me from travelling to many
places around the globe, but as restrictions start to lift my team have begun talking about our next in-person meetup, something we haven’t done since I first started when I met up with my colleagues in Cape Town and got
assaulted by a penguin.
But even looking back to that trip, I recall the difficulties faced by colleagues who e.g. had to travel to a different country in order tom find an embassy just to apply for the visa
they’d eventually need to travel to the meetup destination. If you’re not a holder of a privileged passport, international travel can be a lot harder, and I’ve definitely taken that for
granted in the past.
I’m going to try to be more conscious of these privileges in my industry.
It just passed two years since I started working at Automattic, and I just made a startling
discovery: I’ve now been with the company for longer than 50% of the staff.
When you hear that from a 2-year employee at a tech company, it’s easy to assume that they have a high staff turnover, but Automattic’s churn rate is relatively low, especially for our
sector: 86% of developers stay longer than 5 years. So what’s happening? Let’s visualise it:
Everything in this graph, in which each current Automattician is a square, explains how I feel right now: still sometimes like a new fish, but in an increasingly big sea.
All that “red” at the bottom of the graph? That’s recent growth. Automattic’s expanding really rapidly right now, taking on new talent at a never-before-seen speed.
Since before I joined it’s been the case that our goals have demanded an influx of new engineers at a faster rate than we’ve been able to recruit, but it looks like things are
improving. Recent refinements to our recruitment process (of which I’ve written about my experience) have helped, but I wonder how much we’ve
also been aided by pandemic-related changes to working patterns? Many people, and especially in tech fields, have now discovered that working-from-home works for them, and a company
like Automattic that’s been built for the last decade and a half on a “distributed” model is an ideal place to see that approach work at it’s best.
We’re rolling out new induction programmes to support this growth. Because I care about our corporate culture, I’ve volunteered
myself as a Culture Buddy, so I’m going to spend some of this winter helping Newmatticians integrate into our (sometimes quirky, often chaotic) ways of working. I’m quite excited to be
at a point where I’m in the “older 50%” of the organisation and so have a responsibility for supporting the “younger 50%”, even though I’m surprised that it came around so quickly.
Automattic… culture? Can’t we just show them Office Today and be done with it?
I wonder how that graph will look in another two years.
Unsurprisingly my checkins, which represent #geocaching/#geohashing activity,
grow in the spring and peak in the summer when the weather’s better!
At first I assumed the notes peak in November might have been thrown off by a single conference, e.g. musetech, but it turns out I’ve
just done more note-friendly things in Novembers, like Challenge Robin II and my Cape Town
meetup, which are enough to throw the numbers off.
As I mentioned last year, for several years I’ve collected pretty complete historic location data from GPSr devices I carry with me everywhere, which I collate in a personal μlogger server.
Going back further, I’ve got somewhat-spotty data going back a decade, thanks mostly to the fact that I didn’t get around to opting-out of Google’s location tracking until only a few years ago (this data is now
also housed in μlogger). More-recently, I now also get tracklogs from my smartwatch, so I’m managing to collate more personal
location data than ever before.
The blob around my house, plus some of the most common routes I take to e.g. walk or cycle the children to school.
A handful of my favourite local walking and cycling routes, some of which stand out very well: e.g. the “loop” just below the big blob represents a walk around the lake at Dix Pit;
the blob on its right is the Devils Quoits, a stone circle and henge that I thought were sufficiently interesting that
I made a virtual geocache out of them.
The most common highways I spend time on: two roads into Witney, the road into and around Eynsham, and routes to places in Woodstock and North Oxford where the kids have often had
classes/activities.
I’ve unsurprisingly spent very little time in Oxford City Centre, but when I have it’s most often been at the Westgate Shopping Centre,
on the roof of which is one of the kids’ favourite restaurants (and which we’ve been able to go to again as Covid restrictions have lifted, not least thanks to their outdoor seating!).
One to eight years ago
Let’s go back to the 7 years prior, when I lived in Kidlington. This paints a different picture:
For the seven years I lived in Kidlington I moved around a lot more than I have since: each hotspot tells a story, and some tell a few.
This heatmap highlights some of the ways in which my life was quite different. For example:
Most of my time was spent in my village, but it was a lot larger than the hamlet I live in now and this shows in the size of my local “blob”. It’s also possible to pick out common
destinations like the kids’ nursery and (later) school, the parks, and the routes to e.g. ballet classes, music classes, and other kid-focussed hotspots.
I worked at the Bodleian from early 2011 until late in 2019, and so I spent a lot of time in
Oxford City Centre and cycling up and down the roads connecting my home to my workplace: Banbury Road glows the brightest, but I spent some time on Woodstock Road too.
For some of this period I still volunteered with Samaritans in Oxford, and their branch – among other volunteering hotspots
– show up among my movements. Even without zooming in it’s also possible to make out individual venues I visited: pubs, a cinema, woodland and riverside walks, swimming pools etc.
Less-happily, it’s also obvious from the map that I spent a significant amount of time at the John Radcliffe Hospital, an unpleasant reminder of some challenging times from that
chapter of our lives.
The data’s visibly “spottier” here, mostly because I built the heatmap only out of the spatial data over the time period, and not over the full tracklogs (i.e. the map it doesn’t
concern itself with the movement between two sampled points, even where that movement is very-guessable), and some of the data comes from less-frequently-sampled sources like Google.
Eight to ten years ago
Let’s go back further:
Back when I lived in Kennington I moved around a lot less than I would come to later on (although again, the spottiness of the data makes that look more-significant than it is).
Before 2011, and before we bought our first house, I spent a couple of years living in Kennington, to the South of Oxford. Looking at
this heatmap, you’ll see:
I travelled a lot less. At the time, I didn’t have easy access to a car and – not having started my counselling qualification yet – I
didn’t even rent one to drive around very often. You can see my commute up the cyclepath through Hinksey into the City Centre, and you can even make out the outline of Oxford’s Covered
Market (where I’d often take my lunch) and a building in Osney Mead where I’d often deliver training courses.
Sometimes I’d commute along Abingdon Road, for a change; it’s a thinner line.
My volunteering at Samaritans stands out more-clearly, as do specific venues inside Oxford: bars, theatres, and cinemas – it’s the kind of heatmap that screams “this person doesn’t
have kids; they can do whatever they like!”
Every map tells a story
I really love maps, and I love the fact that these heatmaps are capable of painting a picture of me and what my life was like in each of these three distinct chapters of my life over
the last decade. I also really love that I’m able to collect and use all of the personal data that makes this possible, because it’s also proven useful in answering questions like “How
many times did I visit Preston in 2012?”, “Where was this photo taken?”, or “What was the name of that place we had lunch when we got lost during our holiday in Devon?”.
There’s so much value in personal geodata (that’s why unscrupulous companies will try so hard to steal it from you!), but sometimes all you want to do is use it to draw pretty heatmaps.
And that’s cool, too.
How these maps were generated
I have a μlogger instance with the relevant positional data in. I’ve automated my process, but the essence of it if you’d like to try it yourself is as follows:
First, write some SQL to extract all of the position data you need. I round off the latitude and longitude to 5 decimal places to help “cluster” dots for frequency-summing, and I raise
the frequency to the power of 3 to help make a clear gradient in my heatmap by making hotspots exponentially-brighter the more popular they are:
This data needs converting to JSON. I was using Ruby’s mysql2 gem to
fetch the data, so I only needed a .to_json call to do the conversion – like this:
db =Mysql2::Client.new(host: ENV['DB_HOST'], username: ENV['DB_USERNAME'], password: ENV['DB_PASSWORD'], database: ENV['DB_DATABASE'])
db.query(sql).to_a.to_json
Approximately following this guide and leveraging my Mapbox
subscription for the base map, I then just needed to include leaflet.js, heatmap.js, and leaflet-heatmap.js before writing some JavaScript code
like this:
body.innerHTML ='<div id="map"></div>';
let map = L.map('map').setView([51.76, -1.40], 10);
// add the base layer to the map
L.tileLayer('https://api.mapbox.com/styles/v1/{id}/tiles/{z}/{x}/{y}?access_token={accessToken}', {
maxZoom:18,
id:'itsdanq/ckslkmiid8q7j17ocziio7t46', // this is the style I defined for my map, using Mapbox
tileSize:512,
zoomOffset:-1,
accessToken:'...'// put your access token here if you need one!
}).addTo(map);
// fetch the heatmap JSON and render the heatmap
fetch('heat.json').then(r=>r.json()).then(json=>{
let heatmapLayer =new HeatmapOverlay({
"radius":parseFloat(document.querySelector('#radius').value),
"scaleRadius":true,
"useLocalExtrema":true,
});
heatmapLayer.setData({ data: json });
heatmapLayer.addTo(map);
});
West Germany’s 1974 World Cup victory happened closer to the first World Cup in 1930 than to today.
The Wonder Years aired from 1988 and 1993 and depicted the years between 1968 and 1973. When I watched the show, it felt like it was set in a time long ago. If a new Wonder
Years premiered today, it would cover the years between 2000 and 2005.
Also, remember when Jurassic Park, The Lion King, and Forrest Gump came out in theaters? Closer to the moon landing than today.
…
These things come around now and again, but I’m not sure of the universal validity of observing that a memorable event is now closer to another memorable event than it is to the present
day. I don’t think that the relevance of events is as linear as that. Instead, perhaps, it looks something like this:
Recent events matter more than ancient events to the popular consciousness, all other things being equal, but relative to one another the ancient ones are less-relevant and
there’s a steep drop-off somewhere between the two.
Where the drop-off in relevance occurs is hard to pinpoint and it probably varies a lot by the type of event that’s being remembered: nobody seems to care about what damn terrible thing
Trump did last month or the month before when there’s some new terrible thing he did just this morning, for example (I haven’t looked at the news yet this morning, but honestly
whenever you read this post he’ll probably have done something awful).
Nonetheless, this post on Wait But Why was a fun distraction, even if it’s been done before. Maybe the last time it happened was so long ago it’s irrelevant now?
Of course, there’s a relevant XKCD. And it was published closer to the theatrical releases of Cloudy with a Chance of Meatballs and
Paranormal Activity than it was to today. OoooOOoooOOoh.
Lots of interesting results from the @bodleianlibs staff survey. Pleased to have my suspicions confirmed about my department’s propensity
to be accepting of individuals: it’s the only one where a majority of people strongly agreed with the statement “I feel able to by myself at work” and one of only two where
nobody disagreed with it. That feels like an accurate representation of my experience with my team these last 7-8 years!
If you’ve spent any time thinking about complex systems, you surely understand the importance of networks.
Networks rule our world. From the chemical reaction pathways inside a cell, to the web of relationships in an ecosystem, to the trade and political networks that shape the course of
history.
Or consider this very post you’re reading. You probably found it on a social network, downloaded it from a computer network, and are currently deciphering it with
your neural network.
But as much as I’ve thought about networks over the years, I didn’t appreciate (until very recently) the importance of simple diffusion.
This is our topic for today: the way things move and spread, somewhat chaotically, across a network. Some examples to whet the appetite:
Infectious diseases jumping from host to host within a population
Memes spreading across a follower graph on social media
A wildfire breaking out across a landscape
Ideas and practices diffusing through a culture
Neutrons cascading through a hunk of enriched uranium
A quick note about form.
Unlike all my previous work, this essay is interactive. There will be sliders to pull, buttons to push, and things that dance around on the screen. I’m pretty excited about this,
and I hope you are too.
So let’s get to it. Our first order of business is to develop a visual vocabulary for diffusion across networks.
A simple model
I’m sure you all know the basics of a network, i.e., nodes + edges.
To study diffusion, the only thing we need to add is labeling certain nodes as active. Or, as the epidemiologists like to say, infected:
This activation or infection is what will be diffusing across the network. It spreads from node to node according to rules we’ll develop below.
Now, real-world networks are typically far bigger than this simple 7-node network. They’re also far messier. But in order to simplify — we’re building a toy model here — we’re going
to look at grid or lattice networks throughout this post.
(What a grid lacks in realism, it makes up for in being easy to draw ;)
Except where otherwise specified, the nodes in our grid will have 4 neighbors, like so:
And we should imagine that these grids extend out infinitely in all directions. In other words, we’re not interested in behavior that happens only at the edges of the network, or as
a result of small populations.
Given that grid networks are so regular, we can simplify by drawing them as pixel grids. These two images represent the same network, for example:
Alright, let’s get interactive.
…
Fabulous (interactive! – click through for the full thing to see for yourself) exploration of network
interactions with applications for understanding epidemics, memes, science, fashion, and much more. Plus Kevin’s made the whole thing CC0 so everybody can share and make use of his work. Treat as a longread with some opportunities to play as you go
along.
observed that e.g. /r/MegaLoungeV is a highly-active sub, which isn’t necessarily what you’d expect if the activity level was based
entirely upon the number of people permitted to have access.
I started wondering if there might be a better predictor of engagement levels. I experimented by looking at the ratio of how many people ‘subscribe’ to each MegaLounge to
how many people are permitted into there. This isn’t a perfect measure of engagement, of course, but my thinking was that of the people who are invited into a MegaLounge, only
some of those will add it to their front page… but that those who add it to their front page are more-likely to actively participate.
The graph shows three things. From the left axis, the blue and red lines show the number of people who are allowed into each MegaLounge and the number of people who are
subscribed to each MegaLounge. As you might expect, there’s a gap between the two and the gap narrows in higher lounges, where there are fewer people.
But what I was interested in is whether and where this gap changes proportionally: is the “subscription rate” among eligible people higher in particular lounges, and can this been seen
as a predictor of activity levels and engagement rates in each lounge? That’s what the green bars show (against the right-hand axis: note that it doesn’t start at zero). In general,
across the MegaLounges up to and including /r/MegaLoungeSol, there does seem to be a slight upward trend: i.e. the higher a
lounge you’re in, the more-likely that eligible people are to add that lounge to their front page. Beyond /r/MegaLoungeSol the bars
jump around all over the place, probably because of the small number of people ‘up there’, and I suggest that we ignore them: accuracy of this as a predictor would be expected to be
better where there were more subscribers: say, up to about /r/MegaLoungeDiamond.
What this would predict would be a “lull” at /r/MegaLoungeVIII. I don’t know if that’s your experience or not. And, interestingly,
the ‘subscription ratio’ at /r/MegaLoungeV and /r/MegaLoungeX are also unusually
low, bucking the overall trend. What does this mean? I don’t know. But if /r/MegaLoungeV really is to be considered one of
the more “active” MegaLounges, then I think that we can safely say that my hypothesis – that we might be able to predict activity hotspots by looking at the subscription rate – is not
backed up by the data.
As described over there, I’ve come up with a way to make
graphs of the speed of everybody’s ascension through the MegaLounges. But because everybody up here in /r/MegaLoungeIndia is far too
important to have to ask for things for themselves, I’ve pre-generated graphs for you all. Here they are:
Thanks to the data that /r/MegaMegaMonitor collates, I’ve been able to throw together some fun statistics about the MegaLounge chain. Let me share
some graphs with you:
Gildings per MegaLounge
As you’d probably expect, the lower (and more-populous) MegaLounges see more gilding than the higher ones. But…
MegaLounge generosity ratio
…if we take the ratio of the number of gildings per MegaLounge to the number of accounts in that MegaLounge, we get a ‘generosity score’, and that’s not so
clear-cut. The most generous MegaLounges are /r/MegaLoungeSol and the secret MegaLounge, followed by /r/MegaLoungeDiamond, /r/MegaLoungeRussia, /r/MegaLoungeVII, and /r/MegaLoungeVI. I wonder if these are also the lounges with the least content? Perhaps they’re places that people stop off at only as a steping stone on their
journey?
The least-generous MegaLounge is the latest one, probably because there’s nowhere to go yet as a result of being gilded there, but the second least-generous… is /r/MegaLounge itself! I feel like I should go there and gild some people, just to buck the trend!
This is all just something fun I threw together while I’ve been off work sick today. If there’s anything else that anybody would like to see extracted from MMM’s data, let me know:
it’s all interesting stuff!