Thursday 15 December 2011

Another day, another choropleth failure

Whilst over at the Guardian's Data Blog Data blog scratching my eyes out over their Primary School league table map I happened across another recent map that Google Fusion Tables are being blamed for. Here we see the number of US forces casualties of the Iraq conflict:

Casualties mapped as totals, on a choropleth, taking no account of differences in the size of the areas or the population as a denominator. Wow...California and Texas have been decimated. Great use of solid black...nicely illustrates the picture of death. Wonder what colour they'd have gone for if they'd have had more deaths to show? Nice use of overlapping classes in the legend as well (and yes, I'm being sarcastic on both these points).
I zoomed in to check Texas...

...and then I thought about the border area between Pennsylvania and New York State...

Hmm...what's the point of multi-scale here?

These maps break just about every known convention in choropleth mapping as well as being largely pointless in multi-scale format. Why do we have cartographic conventions? Well, because human beings interpret patterns, colours and differences between quantities in very particular ways. Red has connotations, black even more so. Showing different shades of colours across an area map means we compare one place to another...California is clearly affected to a much greater extent than, say, Montana. Just when I figured I was going to have to re-build the map to show how it ought to be mapped I saw the drop down menu and hey presto...they have an option to show the Number of US military personnel killed per 100,000 of each State's here it is:

Now then, that wasn't so difficult. Now we have a map where we can accurately infer comparisons between States. Unsurprisingly, now we are taking account of the differences in population size between the States the pattern is very different to the original surprise there then given the much larger populations in Texas and California which makes it inevitable that more people would be in the military and, logically, more would die. But as a proportion of the population their losses are less than many other States. A good use of pop-ups to show totals (and other metrics/details) as well:

Mapped appropriately we see just how badly Vermont is affected. It didn't even register on the totals version but here, as a proportion it's got it's own class (3-19 people per 100,000) and colour (black!). The pop-up reveals 19 killed in action:

Wait, is that what's being mapped? The pop-up says that 3.535 people per 100,000 were killed in action. Sure, it's in the highest class so it's correct and Vermont DOES have the highest proportion so what's with the 3-19 class in the legend? That, my friends is an 'error'. It's probably just a typo, a slip of the keyboard where someone saw the number 19 (killed in action) and used it in the legend as the upper extent of the class. It's a minor one but here's the's what people see, it's what people believe. Unless you go digging in the pop-up the map suggests Vermont is way out on it's own. It is...but only by 0.5 per 100,000 given Montana's death rate is 2.96. In fact, Montana is some way from the next nearest (Wyoming 2.6 killed per 100,000) so one could reasonably argue Wyoming and Vermont are similar in characteristics and should share the highest class. After all, 2.96 rounds to 3 which puts it in the upper class...or the one beneath it depending where 3 goes.

I'd encourage The Guardian to do more of these proper choropleths...just ditch the totals version and then begin to refine the classification and really isn't that difficult!

1 comment:

  1. nice write-up! I am afraid there is lots of work to do for you! Cheers, Jw.