“# needs to stop reporting and start predicting”
– Michael Carty, twitter.com/mjcarty
The quote above from Michael on Twitter this morning (you can see the full thing here https://twitter.com/MJCarty/status/493812100138287104) got me thinking. It mostly got me thinking about how I seem to be developing a particular breed of Big Data iconoclasm these days.
The reason Michael’s tweet particularly got under my skin was because of the way in which it honed into what I believe to be the central marketing fallacy of the whole Big Data showreel. Let’s make it clear – lots of data gives you the ability to examine the past in increasingly minute detail. Maybe even the present. It does not, and let me repeat that, does not give you a magical window into the future.
Why am I becoming so entrenched in these views? Is it just the signs of an increasingly belligerent and grumpy old man? Possibly, but let me post-rationalise for you anyway.
Firstly, I lived through the era of data warehousing in the late 1990s into the last decade. Every time I see somebody painstakingly creating a management “R/A/G” report by hand-tinting cells in a Powerpoint table, I know that data warehousing didn’t work.
The volumes of data involved in data warehousing were MIcroSD-card sized in comparison to the data sets being talked about in the realm of Big Data. But the reason data warehousing didn’t work isn’t because there wasn’t enough data… in fact, even at the scale of aggregation of data involved in those days, the data was getting too much for many people to get their heads around. The reason data warehousing didn’t particularly work was because people didn’t change their behaviours. If you passionately believe in something you seek out data that supports your view and you dispel data that doesn’t – this is a psychological trick known as selection bias.
But was there data to support the implementation of data warehousing in the first place? Yes – and it was mostly made up. Oh how many time I heard (and even retold) the story of beer and nappies in a supermarket? It was loosely based on some facts, but was a story as much as anything else in the fiction aisle.
Step forward to the present day, and there are a whole bunch of psychological biases and tricks at play when it comes to Big Data. The first is the deep-seated human need to reduce ambiguity through predicting the future. This has been explored at great length by Dan Gardner in his book Future Babble, and also by the Freakonomics team in this podcast. We listen to those who claim to see into the future because they reduce down our own uncertainty. Big Data is just the latest in that long trend that goes back to the Oracle at Delphi and beyond.
Not only do we listen to those who predict the future, but those who occasionally get it right get great plaudits – this is known as survivorship bias and it’s a bit like asking a lottery winner about advice on how to win the lottery (actually we kind of do this – witness the signs up in some lottery ticket vendors proclaiming how many winners they’ve had…). Looking at success stories of prediction without looking at all of the failed predictions is alluring, but nonsensical.
Now actually Big Data might be quite good at predicting the future (most of the time) in comparison to humans because Big Data will predict based on trends. If we could put the issue of selection bias aside, that then might make for better decision making. Except when the unexpected occurs at which point Big Data extrapolating trends will fall on its backside.
And here’s the other big challenge for Big Data – that it’s great at spotting correlations but does nothing to understand causality. And that’s a big issue, as Tim Harford eloquently explains in this article in the FT. A lot of the data driven decision-making so enamoured of the likes of Amazon and Google ignores concepts of causal links. Relying on a correlation by building a business around it without understanding why it occurs is a very risky game.
Now, having vented my spleen, is there any use for Big Data? Well, sure. Our ability to process and interrogate massive data sets as a result of the consequences of Moore’s Law has some fantastic possibilities – from genome research to machine translation. But if something like “making better business decisions” didn’t work with smaller data, I’m very skeptical that it will work when more data is pumped into the process. And I’m even more skeptical when it’s thought Big Data can do actual magic.
As a final thought, when considering Big Data, here’s an alternative lens to look through. There are many Cloud-based big data providers. They charge for their services based on a combination of how much processing they do, how much data they store, and how much data they pass into and out of their systems. You don’t need Big Data analytics to see what a compelling business case Big Data is for those companies…