Tuesday, 24 February 2015

Messy heat map

My last blog post on heat maps was an attempt to persuade map-makers that the term actually means something other than what you might think it means...and that doing cluster analysis of some form or other on your data more than likely requires a better understanding of data and technique than a so-called heat map generator provides.

My Twitter feed lit up today as Manchester City played Barcelona in the Champions League Round of 16. Lionel Messi, the Barcelona forward, had a fine game by all accounts (to be fair he pretty much always does) and Squawka were on the ball with their live analysis during the game.

Squawka provide a web-based view of sport that collates and presents data as it happens. They tweeted the following:



Needless to say it brought on a nervous carto-twitch. If you read the previous blog you'll know by now that whatever the above is, it's not a 'heat map'. It's a density map of some form of cluster analysis but it illustrates far more than just another example of an inappropriately named map.

Here are some of the issues I see with this map and how similar issues are seen in almost all of these sorts of maps. They may help to understand that it isn't, in fact, a map of Messi leaking all over the pitch.

What is the data that was used?

Messi presumably ran about the pitch yet the splodges look like they are based on point data. Is this where he had the ball? Where he received the ball? Where he passed the ball to another player? Where he was stationary for a period or simply where he stood watching as Suarez scored the goals? Etc etc. While logic suggests that a map of Messi's running should be linear we are immediately confused in trying to decipher what this data actually represents because it looks like points that have been analysed to create a representation of clusters (more points = larger or more intense splodge). Without knowing what data the map represents we cannot decide whether he was all over the final third or not. If the data is indeed points then is that an appropriate metric and can it justifiably be used to show what they purport the map to be showing?

What are the fuzzy splodges?

The typical symbology on these sort of maps tends to go through some form of spectral colour scheme and this is no different. The hazy blue splodges are likely where less clustering occurred but is this a fleeting movement or pass or where he tripped over his laces? As Messi moves more or passes more (or whatever more) the intensity of the symbology increases. But what precisely does this represent? We have no legend to tell us what changes in colour mean and whether colour is mapped onto the clustering values linearly or logarithmically or...

Indeed - if you look at the overlap of two hazy blue splodges near the bottom centre of the map you'll notice that a simple overlap at the edge of two hazy blue splodges results in a bright, intense change in symbol. But if these hazy blue splodges are built from point data (presumably at the centre of the hazy blue splodge) then the overlap is simply an artifact of overlapping symbology...not necessarily overlapping data. These artifact overlaps occur everywhere on the map so it's unclear what the relationship is between data and symbology and how that then translates to Messi's actual movement or involvement.

The statement of being all over the final third also doesn't exactly stack up either. The main splodges are in a zone towards the top of the pitch graphic...a little left of centre but certainly not all in the final third. We'll assume Barcelona are attacking the left half of the pitch graphic and that even though teams switch sides at half-time the graphic maintains teams in the same half for mapping purposes.

All in all it's a graphic that reveals very little except gross error and uncertainty and which is utterly impossible to interpret in a way that reveals anything sensible about Messi's contribution to the game. 

These sort of back of an envelope 'heat maps' are unhelpful for any visual or analytic task. Quick to produce yes, but you can't make any sensible or quantifiable interpretation. Finally, we have no-one elses maps to look at so we simply have to presume that every other player's heat maps are in some way visually inferior to Messi's map.

Messy data. Messy clustering. Messy symbology. Messy map. Messy communication and very messy ability to interpret, compare or understand. Poor old Lionel Messi who is, quite literally, an innocent bystander in all of this...as the map, sort of, shows.

No comments:

Post a Comment