Wednesday 22 August 2012

Getting steamed up over heat maps

Seen Amazon's Election Heat Map 2012? Here's a screen grab to whet your appetite...

Notice anything strange?  (@briantimoney did, which is where I picked this one up). It's a Heat Map.  A what? I said a 'heat map'.  What the blue blazes is a 'heat map'? I'll tell you what it's a made up name to make a map seem more edgy, more sensationalist, What you're looking at is a sexed up choropleth map. A perfectly good mapping technique, given a new name by Amazon because I guess the term choropleth seemed a little outmoded.  Why not go the whole hog and call it a heat-o-graphic? Actually, why not just give the map a decent title like 'Election reading preferences 2012'.  tell it how it is.  Call a spade a spade and we get it.  Instead, we have to wade through a nonsensical title and then a subtitle "What are American's Reading".

It's along the same lines as 'hot spot map'...that's a map that shows clusters of something. The word cluster being a perfectly good descriptor and, also, having the benefit of being a shorter single word yet hot spot is somehow preferred.  Does it make the map more plausible, more understandable or merely more sensationalist?

Now that the point of this blog has been vented, let's dig a little deeper and see if we can't find some other interesting perspectives...

The map itself shows the distribution of 'political book' purchases from Amazon in the last 30 days, where the books have been categorised as red (Republican Party), blue (Democratic Party) or neutral according to their political reading. Each state's colour is assigned based on the percentage comparison of the top 250 selling blue books compared to the the equivalent red books.  States with a higher percentage of one type over another are coloured in darker shades.

So what does the map show?  It clearly shows a strong leaning towards Republican favoured reading material (oddly for Brits this is a source of constant confusion...Republicans are right-leaning Conservatives, which we designate using blue in the UK!!!). There's a few problems with this.  Books are rarely so overtly Red or Blue so making such a divisive classification adds unfathomable bias into the map.  Does it show a Republican wave taking the US by storm?  Does it show that Republicans tend to read political books more? Does it show that Democrats are buying Republican reading material to figure out what alternative policies and views might be out there? Or maybe it shows that publishers are more likely to either publish right-leaning books or, perhaps, market them more effectively to an audience more willing to consume them? What is clear is that the map does not in any way reflect possible voting patterns, yet by using the same approach as we commonly see on choropleths, sorry, heat maps, of election results we cannot help but make that association.

As a way of seeing what's selling in spatial terms it's quite useful. You can click on each state and see what the top selling political books of each persuasion are.  Here's California's selection which although showing 51% leaning to the red, it's worth noting that California is an avowed Democratic State proving that those who read might not be those who vote:

Now wait just a minute...The Price of Inequality on the left and both The Amateur and Killing Lincoln on the right are each listed twice. That means they are counted twice...that means the assumption has been made that different formats of the book indicate a single independent purchase.  What if people buy multiple editions of the same book? That could easily mean the map shows that Republicans for some reason tend to buy multiple editions of the same book more than Democrats. So we also potentially have double-counting in the data to contend with.

So it's not actually an election map at all, let alone a heat map. In fact, it doesn't even go to the extent of showing us whether 'other' books are far outweighing political reading material which would be interesting in its own right. What we actually have then is a choropleth map showing spurious sales data in its crudest form.  To be fair to Amazon, they don't try and hide what the map shows or how it has been represented. I guess they just decided that a choropleth of sales data isn't a particularly sexy map to get people interested in...

Wednesday 15 August 2012

Does it matter if the map is wrong?

Dear @BBC_Magazine,
Today you had a nice little article online asking 'Olympic Counties: Does it matter where medal-winners come from?'. You even begin with the phrase 'A geographical breakdown of Team GB's Olympic success reveals some areas with more prizes than others'. It's a pleasant enough commentary on social and spatial identity and belonging and the way in which towns and cities across the UK are claiming association with Olympic medalists....and you point out in a very British way the error of most of this with facts about the real place where Andy Murray was born or how Yorkshire is in fact not a single place etc etc.

The trouble go and spoil all that 'my county is more important than yours' with a map that fails in truly Olympian fashion:

Ollie O'Brien (@oobr) was astute enough to take a screen grad (thanks to him for passing it on to me) and pointed out the first wave of mistakes on Twitter (Aberdeen not being in Aberdeenshire; Plymouth not having a medal, Edinburgh in the wrong place etc).

The map seemed to be very rapidly 'fixed' with a series of iterations during the day.  On the one hand I applaud the BBC for taking note of Ollie's constructive criticism and making corrections rather than ignoring them. On the've missed a belter of an error. Here is the map as it was still being shown by the following day...

You've gone and done it again...mapped raw numbers using a choropleth (BBC is developing a habit of doing this).

Mapping totals in this way is completely spurious.  You've got to adjust for the different sized areas and populations (normalise). Yes, London is a big city and Greater London has a population of 8278251 and 10 medal winners.  That's 0.12 medals per 100,000 people. Cardiff has a population of 346100 and 4 medal winnesr.  That equates to 1.16 medals per 100,000 people.  So the population of Greater London is nearly 24 times larger than's therefore perfectly reasonable to expect (all other things being equal) that more medalists will hail from the Greater London area despite Cardiff actually doing far better in relative terms.  The point about the map is that to compare one area's medal haul to another requires you to modify the raw counts precisely to allow readers to compare them on a per capita basis.  As it stands your map presents a meaningless distribution.

Other than that glaring error... nice colours. Check. Clear labels. Check.  All going well so far but wait...this is an article about assigning medal counts to place of birth and yet you assign medals to the Cities of Aberdeen, Glasgow, Edinburgh and Cardiff and others to large counties and that's after citing work from Dr Garner of Aston university whose work makes clear that people's loyalties are much more localised (estates, areas and, at most, a town or city). You've used a nicely inconsistent mix of English county boundaries, Scottish council Areas, Welsh unitary authorities and Northern Irish council areas.  What happened to English unitary authorities? In fact what happened to was shown as a unitary authority on version one without a medal, then it magically disappeared in version 2. Maybe a proportional symbol map centred on cities would have been a better approach to at least be consistent? While we're talking about consistency, are you labeling counties or cities? You label London yet shown the Greater London boundary.

So I re-did the map.  The map below shows the per-capita distribution on the left and whaddya know...actually London didn't do so well in relative terms.  I also fixed the issues with the boundaries so there is at least consistency.  And if you still insist on mapping totals, the version on the right maps the raw medal totals as a proportional symbol map.

So, BBC, you wrote an article about the importance of place and identity yet failed monumentally in using that most geographical of tools...a map, to illustrate the facts. In so doing you destroy any faith the reader has in your report and for those who don't recognise the errors, you perpetuate misconceptions through erroneous cartography.

Does it matter if the map is wrong?  Well, yes, particularly in an article about geography, place, spatial identity and where the basis of the discussion is about comparisons between places.

PS...Before someone cries foul that I didn't include Northern Ireland or the Isle of Man, Channel Islands or any of the 9 foreign countries that Team GB medalists hail from...I know...I'm sorry...I just didn't have the time or data to go to those lengths and all this blog was supposed to do was make a general point rather than build a perfect replacement.

Thursday 2 August 2012

Mapping NBC Olympic Opening Ceremony coverage

There has been plenty of constenation about the coverage of the Olympics that US broadcaster NBC is subjecting US subscribers to. Given I now reside in the US I have seen first hand the abomination. It is truly the most excrutiating experience ever.  Delayed broadcasts, more ads than athletes, inane commentary and yes (though to be at least partly expected) an almost entire focus on US athletes.  I wouldn't mind the last of these because it is a US broadcaster after all but I'd have hoped for at least some coverage of other nations and sports that don't have a US medal hope.  Anyway, this is a blog about mapping so...

A couple of weeks ago I presented a range of simple thematic web maps I'd built for the 2012 Esri User Conference on medal hauls and competitor counts.  The purpose was really to demonstrate how to design and construct them using ArcGIS so if you're interested you can take a look here (and let me know what you think!).

I was also developing a map of the NBC opening ceremony coverage but Andrew Shears beat me to it (here) as part of a terrific critique of NBC's coverage.  The map he built shows the amount of air time given to each country during the Parade of Nations at the opening ceremony.  NBC's heavy editing seemed to present a somewhat unbalanced viewing experience and I too was keen to map the air time data.  Every country was subjected to some comment about their political status, struggles, crises, war and religious 'difference'.  Some of the commentary was not just ignorant but bordered on bigotry with commentators poking fun at country names, customs and such like (often mispronounced).

I show my version of Andrew's map below, illustrating those countries that received more 'air time' in darker shades using the same classification scheme as he did.  Andrew called his map 'NBC's Geographic Imagination' and as it stands it serves the purpose of illustrating how the US perceives the rest of the world.

So no surprises eh? Team USA and the host nation (Great Britain) get huge air time followed by Australia, China and then the rest of the world.  But just a moment...while I entirely concur that many nations were short-changed (particularly those eschewed in favour of ad breaks or those immediately after Team USA that were largely overlooked), looking at the total air time doesn't really tell the whole story.  While it is true that NBC controlled their delayed coverage and made editorial decisions about the time each nation was on air ('edited for an American audience' we are told), surely the number of competitors plays a key part in this metric?  Simply put, a nation with hundreds of athletes takes far longer to enter the stadium than a smaller one with only a handful.  This, then, is a great example of how mapping totals on a choropleth doesn't always reveal the story (though it does reveal a story).  Before I present an alternative, I re-classified the total air time into quantiles...

It's the same as the first map but just re-classified in terms of the categories each nation falls into.  Now, take a look at the following map...

This is a map of the total air time for each nation, normalised by taking account of the number of competitors to give air time per athlete.   It's classified using quantiles so it is immediately comparable (in visual terms) to the previous map. And what a different picture!

The map can now be viewed as a level playing field (pun intended) and on this basis, one could argue that NBC were actually rather generous to many nations in comparison to coverage of Team USA and the other well represented teams. Certainly, many African nations did well.  As you might expect, the main competitors in terms of the all important medal table (China, Russia) didn't do so well and competitors in certain disciplines (Australia in swimming for instance) also received less coverage.  So while there was a clear bias in some elements of NBCs editing, the map of total air time doesn't really tell the whole story.

That said, viewing the ceremony did leave one with a sense of foreboding as our brains aren't that good at working out relative proportions on the fly.  In that sense at least, NBC didn't do very much to counter the various accusations of bias and poor editorial judgement.

Finally, is a map the best way to look at this relationship? Take a look at the following scattergraph (which excludes Team USA and Great Britain as their large teams and large air time are significant outliers). Number of athletes is on the y (vertical axis) and seconds of air time on the x (horizontal) axis.

So we have a reasonable spread and there is some sense of a positive correlation (albeit a weak one) that shows the proportion of air time broadly equates to the number of athletes where a nation has over 100 athletes.  But look below that 100 athlete line...there are dozens of nations receiving a lot of air time with relatively few athletes. So in this sense the scatter graph at least supports the notion that a map normalised by competitors at least tells a more realistic story.

Just another example of how maps can mislead the innocent.