The tap and the lake – the changing nature of information flows

Explaining the metaphor

I came across the concept of the tap and the lake in a discussion paper on the future of corporate reporting (The changing flows of corporate performance information). The tap describes historical (and current) corporate reporting (periodic, unidirectional, controlled by the company) while the lake recognises that digitisation has changed the world. Data and information relating to a company can trickle, or flow, in from multiple sources, can arrive at any time, can be accessed by anyone, for almost any purpose, and can be contaminated perhaps.

The discussion paper raises interesting and important questions about how corporate reporting should change as a result of the new reality. I thought I was going to document my thoughts from an investment perspective. But I find myself being drawn to a higher level of abstraction (who saw that coming?!). Surely this metaphor applies equally well to news. Once upon a time the newspaper was the tap, controlled by the publisher, delivering periodic and unidirectional information. Fast forward to our current context and the news lake (ocean?) is fed from millions of sources, not all of which are reliable or well meaning.

Data | information | knowledge | understanding | wisdom

This section’s title sets out a clear data hierarchy. Data is some set of symbols (numbers, letters, emojis…); information is contextualised data (so ‘C’ in one context is the initial letter of a person’s name, and in a different context – Roman numerals – represents the number 100); knowledge is organised information; understanding is interpreted information; and wisdom is utilised understanding.

The first point to make, therefore, is that the tap is providing – and the lake contains – information, possibly knowledge, but not data. This is not shocking given the history, as the intended consumer was a human and humans generally do better with contextualised data rather than the raw data itself. But it is also important, as propaganda is nothing other than data that has been contextualised in a particular way, for a particular purpose. I recognise that propaganda is a strong and potentially emotive word, but I use it deliberately. Consider the last published annual report of Enron, or WorldCom, or any other entity, before they declared bankruptcy. What label should we attach to that information? Is propaganda too strong a label, if the intent of the report was to deliberately mislead? Or consider fake news. Is it annoying pollution that is an inevitable by-product of the modern economic machine? Or is it purposeful and, possibly, state-sponsored? In which case we should label it appropriately – as propaganda, designed to mislead us.

How do we progress?

If we accept that the goal is get to wisdom, in order to make wise decisions, and we know that some of the information available to us is contaminated, then what should we do? [Please note that we are not talking about data scrubbing or cleansing here; we are talking about contaminated information, so data that may or may not be clean, that has been wrongly contextualised – whether in error or deliberately.]

I can see two broad options:

accept the current reality, and use the information as best we can. Whether this would involve an attempt to clean the information before analysis, or involve statistical filters during the analysis is outside my knowledge domain,
change the future reality by attaching a reliability score to each item of information. Again, this is beyond my sphere of knowledge, but conceptually I am aiming for the equivalent of a record of provenance to attach to each piece of information. Presumably this would require a new internet protocol, which sounds difficult – but it also sounds like an increasingly important public good given the likely digital content of our future lives and economies.

The rise of technological tools

We know that machine learning algorithms are getting increasingly accurate in their labelling of pictures of cats (and other tasks); we have observed the success of two-way buyer and seller ratings on online platforms; and we are aware of firms using algorithms to assign weightings (believability or reliability scores) to employees to improve decision making. In this light, we can envisage that all items of information being added to the lake would be vetted by technology and a tag, perhaps containing its reliability score, would be added. Don’t we effectively do this already – albeit in a qualitative and unstructured way – when we chose to emphasise or deemphasise certain elements when making a decision?

In fact, could we imagine a blockchain-like technology being used to more closely resemble the idea of a record of provenance. So a particular item of information would, presumably, have a better provenance if it came from the company directly. And the provenance could be improved if the information goes through an independent audit process (which would also be recorded alongside the information itself).

With some form of public vetting of data having been done, applying distributed ledger / blockchain technology would also make the data uneditable. Everyone would then be free to add their own context, and organise the information as they saw best. In essence we are aiming for freely available data – a public good – and competition in the uses to which the data is put.

How would this change corporate reporting?

As already stated, the goal is wise decisions – and the issue I am exploring is whether we can reduce the noise within the data hierarchy, whether introduced by error or malicious intent. The suggested mechanism is to separate out the contextualisation and make it transparent. Corporate reporting would therefore also be split in the same way. The corporate would be one of the many parties contributing information to the lake, and it would receive a reliability rating. It could submit data in real time – how many units of which products left the factory gates at what time; how many units at which price were invoiced to which customer, and when; whether the customer paid in full, and when. The customer, of course, could be submitting equal amounts of transparency to their data lake – and the relevant cross-checks could be made.

With transparency like that, investment analysts could seek to add value for their clients with the accuracy of their modelling – and the accuracy of the context they apply. Meanwhile the corporate can now periodically release a narrative into the lake. The narrative is more likely to take the form of knowledge or understanding – the corporate should know itself better than external analysts. But the narrative can be checked against the data for reasonableness, or the creep of propaganda.

Why would this be better?

The dissemination of information, whether news or corporate report, has always been subject to change over time. However the digital revolution has delivered a huge chunk of change in a short period of time. We now have the opportunity to reorganise the plumbing – to reassign roles and responsibilities to suit the new skillsets. We can now assign much more work to computers, the cloud and the crowd rather than to individual humans. In the near term there will roles for humans in interpreting knowledge, and using the resulting understanding – and maybe in the long term too. But a redesign could help us get on the front foot regarding fake news, and lead us to better decisions.

Our Hubs