Tuesday 17 March 2015

Drug crazed mapping

I had told myself I wasn't going to bite when @Amazing_maps screamed once more for my attention. But the more I tried to ignore, the more it reeled me in so eventually I thought it worth a few comments.

Here's the so-called amazing map:


I've no idea who made it. It doesn't really matter. What I feel matters is the impact maps like this have on those that view it. This is more about the consumption of maps but, of course, their design and construction goes a long way to underpinning the message people take away.

Quick look and take away: Holy drug barons Batman,...San Bernardino is full of crack-heads! So are a few smaller areas I don't even know....but they're really small so they can't be as important eh? Right, must be time for Alaska State Troopers, turn on the TV...

That's how a lot of people will look at this map. Message delivered. Warped view of reality perpetuated. Job done. Wait for the next Amazing Map.

Here's the longer look and take aways I formulated...

Hmm. Something's not quite right with this map. Let's talk it through. It's a choropleth. We can assume from the title...well, the line that doubles as the legend title, what the subject matter is. It's about the labs, not the population so it's about production, not consumption. And the colour scheme goes from light to dark so we see where there are more meth labs and where there are fewer. I'll not repeat myself like a cracked record about it being totals (but it is) and not normalised (but it isn't) suffice to say it needs to have the data transformed into per capita or something equally sensible to allow us to compare like for like. Though critical for a choropleth, let's ignore that for the purposes of this because there's other 'take aways' in this map.

Look at San Bernardino County again...jeesz, it's heaving with meth labs.



This makes me a little more interested (perhaps concerned) as it's where I live. Notwithstanding it's totals, look at that large, expansive area filled with loads of meth labs. How many?...there's about...errr, well, let me look at the legend. hmm. It's dark blue. Does that make it 300, 500, 1000 meth labs?

It's impossible to tell without doing some assessment of the actual RGB values. It's actually closest to the RGB value about 1/3 along the legend colour ramp which would make it about 330ish...though there are no RGB values in the legend that match those found in San Bernardino County so it's impossible to be certain and why am I having to do an RGB analysis of a legend anyway? It shouts out from the map yet is nearer the lower end of the legend. That doesn't seem right.

So San Bernardino leaps out because 1. it's the largest county in the US 2. It has a lot of meth labs (though possibly not per capita or in relation to counties with many more) and 3. It's dark blue and that means 'more' except there's virtually no differentiation between the blue used at 330 and that for 1000. All the variation in colour value is at the lower end.



The map uses an unclassified choropleth approach. That means every data value is given its own position along the chosen colour ramp. I'm not a huge fan of unclassified choropleths. Choropleths are generally used to show where places are similar and that relies on classifying your data into groups that display similar characteristics. All you can really see from an unclassed choropleth is the extremities...which areas tend to the maximum and which to the minimum. It's really difficult to assess where those in-between values might sit...and that's assuming the scale is linear and the colour scheme is applied linearly. Of course, you can stretch colour to be applied non-linearly but then it's an even more confusing picture that's arguably more difficult interpret visually. If you don't classify data before mapping it then you're painting by numbers and it's a bypass to considering your data and teasing out the message through careful classification and symbolisation.

I'm going to add a caveat here - if the map is for interactive web display and the user can hover or click an area to retrieve the value directly, then unclassed choropleths are, arguably, less problematic because people can retrieve values across the map. I'd still contend, however, that if we know the map is classified into, say 5 classes using natural breaks then every county symbolized in the same shade of blue is 'similar'. It's an important metric we can easily see in the map and it's a good default. Other classification schemes exist to suit alternative purposes. If we use, say, a quantile scheme of 5 classes then we know each class shows 20% of the data values in rank order - again, similarity between values, across the entire range values, can be easily seen and it's simple to see which areas are in the top 20% of values.  If you make two choropleths then using something like a quantile scheme allows you to compare the two maps on a comparable cognitive basis. Clicking to retrieve a value is an additional step in the map reading process. Trying to remember values from one hovered-over area to another is equally taxing because our short term recall is not our best cognitive function (think of memorizing and recalling a pack of cards in order...it's not easy!). I like maps to 'show and tell' rather than require further processing or actions by the user to reveal the message.

Onto the colours. Because there are just so many different shades of blue across the map we get a sense of some overall pattern but we can't really tell which are similar to which. How similar is San Bernardion COunty's colour compared to the other dark blues across the other side of the map? It's called simultaneous contrast and is a problem for our map reading. Our perception of a colour (or shades of a colour) varies as we look across the map due to the colours that surround it. Look at the following two grey squares and how they are affected by the surrounding shades:


The grey square differs in perception depending on whether it's surrounded by dark or light.  A darker surround makes us see it lighter than if it has a lighter surround. Now look at how different colours modify the grey square:


The grey squares, despite being the same, take on a perceived tinge of colour based on what's around it. And when the image gets even more complex we have even more difficulty processing what we see. In the following animation, which grey square, A or B, is darker?


Of course, the greys in A and B are the same. In the above diagrams all the grey squares are seen differently simply because of their surroundings. The map of meth labs has over 3,000 counties, each shade of blue being surrounded by it's own different mix of blues.

These perceptual issues are also a problem in classed choropleths of course - but not nearly to the same degree because it's much easier to distinguish and differentiate 5 or 6 shades of blue across a map than it is to try and make sense of several hundred (thousands?) different shades of blue.

And what about labels? Yes we can probably all recognise it's the U.S. I know where my home is so I recognise San Bernardino County. I've no real way of describing where other patterns exist in language that makes sense. Giving people context is important. Interactive maps support this through basemap labels or, again, hover and click...but however you deliver the map, give people a way to reference the patterns they see.

So the take-aways for me...
  • It's totals. If you can't or won't change to a rate or ratio then use something other than choropleth like a dot density, proportional symbol, dasymetric or cartogram.
  • If you have to use unclassed choropleths then scale your data across the range of colour so that extremeties don't dictate the way values map onto the colours. Make the legend more useful by providing labels at key positions and make your map interactive so people can retrieve values.
  • Go with a classed choropleth if you want people to 'see' more than just the extremeties in your data and how different areas are similar to others for all values that display similar characteristics. Learn which classification techniques are going to manage your data most appropriately for the message you want to share.
  • Be aware of the problems of simultaneous contrast.
  • Include some form of labelling to give people a way of referencing the geographical patterns they see.

Other problems...no real title, no source, no credits, no dates, no contact details. Nothing. Like I said, I don't know where the map came from but as is, it's a fail in every respect.

Finally, I tried to get the data to recreate this as a per capita but after a quick search I wasn't able to find it at county level. Instead I came across this abomination on the Drug Enforcement Administration web site:


I don't even know where to start with this one, and they've made one per year for the last few years. They were clearly on something or other. And if we assume the DEA reporting is accurate (and the most current) AND that the Amazing_Maps one is broadly of the same time period (OK, a lot of assumptions) then what's with San Bernardino having over 300 meth labs given California as a whole has only 79?

Clearly something's wrong somewhere. Amazing map? Possibly. It's just poorly designed and constructed and gives a totally misleading impression of a dataset that cannot be verified. It's another potentially mildly interesting dataset that's poorly mapped.

And by the way, San Bernardino County is the 5th most populous county in the US so per capita...we may even have a paucity of meth labs so a different map might support the assertion we need more to get our supply increased*. Additionally, while the overall area of the county is about 20,000 sq miles, the populated areas are predominantly crammed into the south west corner in an area roughly 450 sq miles...which makes a choropleth map of totals covering mostly desert even less useful (unless the meth labs are in the desert). And all those less important smaller areas...Seattle, St Louis, Tulsa and Grand Rapids. But because of the way the boundaries lie, choropleths are always going to cause difficulties in interpretation. That's the Modifiable Areal Unit Problem...and a whole different blog entry.

* this is a joke

No comments:

Post a Comment