Thursday 20 June 2013

3 billion tweets on a map

It's been hard this past few days to find anyone willing to say a bad word about the new Locals and Tourists map from that magic combination...MapBox, Eric Fischer and Gnip. All capable of producing cool stuff. This is broadly what you get from their collaboration...a multiscale cached map of the world showing 3 billion dots each one red or blue:

The MapBox blog explains it in more detail but essentially it's every geotagged tweet since September 2011, categorised as either red to indicate a tweet from a tourist and blue, a tweet from a local.

It's impressive in scale and scope. It looks beautiful. The crowd on the interwebs are going gaga for it. I like it as a piece of map porn along with the multitude of other such works I'd put in that category.

Where I perhaps differ from some, though, is in casting a critical eye on the actuality of the map and bypassing the hype and rhetoric so here's a few thoughts...

The map claims to show 'incredible new detail', 'demographic, cultural and social patterns down to city level'. It apparently allows you to 'explore stories of space, language and access to technology'.  OK, if they'd left it at "here's a cool looking map, whaddya think" I'd have bought it but c'mon.

The points are just geotagged tweets. Best estimates suggest only 1% of tweets are geotagged. Some 13% of them don't have exact coordinates. It might be reasonable to suggest we're looking at a sample of all tweeters but is the sample the same or sufficiently similar across all places to make the assumption we can compare like for like? We don't know.

And what of the demographics of Twitter Use? 67% of all internet users use social media. People who live in cities spend more time on social media. Only 16% of those who use social media use Twitter and they are most likely to be adults aged between 18-29...and male.

In short, the data itself is full of error, bias and uncertainty. You can't make any sensible inferences from a map like this. You certainly cannot claim to be able to explore the map in the way suggested.

And then we get to the classification of the data. Tweets are red if the user has posted for one consecutive month in the same place and blue for users whose tweets are normally elsewhere. And from that, they've decided they are able to identify locals and tourists. Spurious to say the least. There are probably hundreds of ways that this rule can be broken and someone move from the red to the blue or vice versa. It may be a convenient hook for the map title but as a classification it is based in fantasy.

Finally we get to the map...take a look and check out places you are familiar with. Does it stack up? More likely than not the answer is no. There appears a much larger element of so-called 'tourism' in places you'd hardly classify as tourist spots and where there are tourist honeypots or towns...there's no discernable difference. Roads are predominantly comprised of red dots as are motorway service stations and anywhere else that people travel through.

The map is largely empty in areas you'd expect there to be lower levels of mobile phone usage (most of the developing world) but we know this already. The map doesn't tell us anything new. Does it give us an insight into large scale city level detail? Not's patchy at best and why not just use a decent street map if that's what you need to find out.

Finally, I was curious about how the red and blue dots were displayed. Having just spent quite some time making a dasymetric dot density map of the 2012 Presidential election results I'm quite familiar with the problems of representing a lot of blue and red dots in close proximity. In order to give equal visual weighting to each colour you have to do a little bit of processing of the data (well, quite a bit actually). So what have they done with the twitter map? I re-engineered the map and played around with the tiles and it's clear that they've just overprinted the blue dots with the red dots. One layer on top of another. In visual terms, then, the map gives far more prominence to the red 'tourist' dots which, of course, makes for a more interesting map. Here's a quick comparison:

Two completely different maps. Two completely different impressions of the data...but users of the map don't get that choice; they get the map on the left and base their interpretations on that...incorrectly.

Let's get one thing a piece of map porn it's spectacular. It also demonstrates that technologically, the mapping world is really at the cutting edge of handling large data sets (not 'big data' per se...this is just a lot of data). The point of this blog, though, is to counter the common perception of this sort of work. There are big questions regarding bias, error and uncertainty and how we represent this sort of data for mass public consumption. Is it useful? Can we really find anything out from it? Or is it just 'art'? Maps have always lied it's true but far more eyes are cast over work these days simply because of its accessibility online and the way social media itself transmits the latest and greatest ideas...but the internet has no quality control, no quality assurance and as with everything out there, if you consume it without taking a moment to think about what it is you're looking at you just drink the Koolaid.

Enjoy it for what it is but don't be taken in by the rhetoric. It is what it is, nothing more...despite what the label says and what all those 'likes' and re-tweets infer.


  1. Thanks for the analysis. It is really unfortunate that the layering worked out the way it did. The original had everything composited together but we made it separate layers so they could be toggled, and it was a bad idea because of the precedence. Trying to figure out now how to make it more honest.

    Interesting point about the difference between travel and tourism. They are kind of the same thing, but I guess there is a difference in intent: stopping because you have to vs. stopping because you are interested in something. I don't know if there is any way to discern intent from tweets, but photos are probably better because they do indicate that you were looking for interesting things.

    I wish the sample size was larger, but I think the density is still legitimate, because counts of geotagged tweets on streets do correlate well with pedestrian counts on those streets. The big places where it falls apart are retirement homes, because that age group really doesn't use Twitter much.

    1. Eric, thanks for contributing and taking up your right to reply!

      Regarding the dots...yes, the desire to toggle will lead to that issue. In my own efforts in this space (admittedly only 13.5 million dots) I merged the two classified sets of dots by assigning each a random number and then reordering them...hence creating the illusion that red and blue were mixed in visual balance. It seemed to work. You could have a 'red', a 'blue' and a 'mixed' version?