Picture the scene. We’ve created a multi-sheet Excel workbook, maybe with links into other workbooks, and it’s used by a dozen people in our department. Gill who originally created the spreadsheet left the organisation years ago. But we’ve managed to keep it going, adding duplicate sheets where we weren’t entirely clear how Gill’s formulae and macros actually worked. Every so often our boss asks for something new, so we quickly add a new column, or a new pivot table, or maybe a duplicate sheet. There’s a lot of “can you get out of the master spreadsheet?” to be heard shouted around our office. We get by.
IT turn up. “You need a database. It will solve all of your operational inefficiencies.” they tell us, promising clever reporting tools and a bunch of whizzy stuff that sounds great. It doesn’t sound too expensive, either. So over a number of months we work with them to create our perfect system. We didn’t realise that we had to stop making changes to our master spreadsheet whilst this was happening.
When eventually the system is ready to go live there are a few teething problems, but it all goes pretty smoothly. There are a few things missing, so we keep the spreadsheet going too, just in case. And then the boss asks for something new, and we ask IT. “We’ll get that to you in six weeks.” they say. And then they talk about money. And so we just do it in the spreadsheet.
Over time, the database doesn’t really get used again…
-/-
On Monday Chris & I had a fascinating conversation with tech journalist Kate Bevan on this week’s WB40 Podcast. One thing Kate said has been reverberating around my head ever since – that today all businesses are data businesses.
I think that part of the reason that this concept resonated with me so strongly is because for all of the talk of the promise of data science and big data, my observations tell me that most organisations are extremely poor when it comes to the good practice of managing data.
There are multiple reasons for this. The first is that strong data management is an activity which requires bending human approaches to the machines. Although we have had many years of NoSQL approaches to database management, fundamentally you still need (in my opinion) to understand the rigour of the approach of Edgar Codd. But the rules of relational database design (and particularly conceptual entity-relationship modelling), whilst crucial to good data management are really quite hard to get one’s head around. Entities and relationships are how computers process data, not how humans think.
The second is that the real world is much messier than the logical structures that computers need to make sense of the world around them.
Years back I spent quite a bit of time looking at structures to encapsulate intellectual property ownership around television programmes. Not only are there multiple layers of such ownership (writers, producers, production companies, performers, secondary performers (that bit of music playing in the background of the pub scene?)), but then there are other dimensions like rights to sell by “channel” (terrestrial, satellite, cable, pay-per-view streaming, subscription streaming, airline…) or by geography (UK, France, Germany) or language (French-speaking Belgium, German-speaking Switzerland)… And then you’d get special exclusions – for example many people in the industry included clauses in contracts that forbade the sale of their work into South African in the Apartheid era.
You end up either with data not being structured other than in documents like contracts (one of the reasons why there is so much hype around AI in the legal services space), or abstractions of the complexity of the data that don’t quite represent the reality to the detail that is really required. It was interesting that on many occasions in my time at Microsoft I would hear folk baffled by how the media industry could come up with a set of intellectual property ownership rules that was so complicated; I’d then refer people first to how complicated physical property ownership is, and then point them at the complexity of software licensing.
The final factor that hampers effective data management in organisations is that described best as “the expediency of Excel”. I’ve one (of many) potted theories which is that if you want to bring the average organisation to its knees, the quickest route would be to bork first email and then Excel. Because despite the hype of particularly the ERP industry, most organisations actually run on Excel and email.
When complicated, often macro-powered spreadsheets are discovered by IT departments, the common response is to say “You need a database”. And on the face of it, that’s often the right assessment. The problem is that it misses the single killer feature of Excel, and the thing that makes it most dangerous for effective data management: extreme user agency. The moment a change is required the loss of control becomes an issue. And changes will always be required because we live in an ever-changing world. And without the limits of memory on modern PCs, spreadsheets can become enormous.
The answer to these challenges? Well, as ever, it’s certainly not going to come from just technology alone. It strikes me that building good data architecture capability across organisations might be the key; Data teams centrally should be focused on coaching and developing good practice throughout rather than being the centre for all expertise. It’s perfectly possibly to create good spreadsheets for the management and manipulation of data, but only if people doing so know what they are doing in the context of the tools within their organisation.
Canonical sources of data are important, but so too are high level, conceptual data models. CDMs should be like a Rosetta Stone for an organisation, providing a common reference point for the semantic meaning of things across its systems. Too often, because they look a bit like what a database physical structure might be, they get mistaken for being technical design documents; they’re not, they’re expressions of business meaning. That doesn’t mean that they’re that intelligible to the lay person, but as I said before the fundamentals of data architecture have remained pretty much unchanged since the 1960s.
If data really is the way forward for organisations, they really do need to start building up the capability to design and manage data effectively otherwise some really bad things are going to happen in the over-complex spreadsheets that underpin so many parts of so many businesses. In fact, those bad things are happening already.
This. A million times over, this.
We (business & IT) have managed to create a situation that is exquisitely mismatched to the needs of the organization. IT focuses on projects, the end users are concerned about the products that they use to make their living, and the concerns of the organization as a whole (what data is available for enhancing decision-making, what’s authoritative, what’s redundant, what’s flat-out wrong/stale, and what exists outside of formal systems) is ignored.
We’ve centralized what should be federated and federated (on a shadow basis) what should be central concerns.
I’ve been thinking about this quite a lot recently from a tech perspective. I’ve also been talking to a range of users.
We should compare notes over lunch soon.