A pile of rubbish bags

Dealing with data inventory

Data held by businesses is often described as an asset. This can be misleading or even incorrect. In either case, data managed inappropriately leaves value on the table, inflates cost, reduces responsiveness, and creates risk.

Some data held by businesses would better be described as inventory. It might one day be a true asset, but equally that might never come to pass. If it’s better described as inventory, how then should we deal with it?

Valuing data inventory

While it’s a very 21st century aspiration to be driven by data, the term “asset” invokes an accounting tradition founded in 8th century Persia, and indeed accounting principles help reveal the truth about the value of data held by organisations.

An asset is a resource that can be used to produce positive economic value.

Inventory is goods and materials that a business holds for the ultimate goal of resale, production or utilisation. You might think this fits the definition of asset, and you would be right. Along with cash, etc, inventory is accounted as one type of current asset.

While inventory is an asset, however, inventory is valued more conservatively than other assets. Inventory is valued at the lesser of cost and net realisable value. Given the high complexity of realising value from data, we might conservatively consider the net realisable value–and hence the lesser valuation–of data inventory as 0.

There may be many scenarios–where the opportunity is demonstrably large, or the path to deploying data solutions is clear-in which you can argue for a higher valuation for data inventory, but 0 is a benchmark valuation to keep in mind.

So how you value your data will depend on whether you consider it as an asset or inventory, and we need some sort of test to discriminate.

The test for data inventory

Now we come to the test for data asset vs data inventory, and its implications.

Following the accounting principle of conservativism, I would argue that, unless the data is actively consumed in a business process (or customer experience, etc), we cannot consider it an asset, because it fails to demonstrate that it can be used to produce positive economic value.

If the data is instead held for some future purpose in some state of processing, but not yet consumed by an end user, then it is inventory.

Data liabilities

Acquiring, processing and holding analytic data comes at a cost. (I’m talking about the cost, effort and risk above maintaining the operational systems that are the source of analytic data, and note that some architectural patterns, like event sourcing, can somewhat bridge the operational-analytic gap.) The cost of licensing, compute, storage and network is part of this, but the cost of people to analyse and govern data, and to implement solutions that extract, transform, load, secure, catalogue and monitor data, may well be greater. These costs, or liabilities, apply to data assets and inventory.

Data assets, by the test above, further incur a cost to serve.

Holding data also represents additional potential liabilities – such as fines or remediation cost – or impairment of other assets – such as brand – in the case of a data breach. The likelihood, impact and remediation effort of a data breach all become greater the more data in volume and variety you hold (though can be offset with good controls and mitigations), and arguably provision should be made for these potential liabilities if we are accounting for data.

These considerations show that data inventory carry almost all the same substantial liabilities as data assets, but deliver none of the value.

So how do we build more data assets with less data inventory?

Leaning into data inventory

Lean practice–originated in 20th century Japan–also identifies inventory as a major form of waste, consistent with the accounting perspective.

Lean further gives us actionable guidance on reducing inventory and other wastes, as I described in 7 wastes of data production – when pipelines become sewers.

The key steps to deliver value with minimal waste in Lean are:

  1. Define value [for a customer]
  2. Map the value stream
  3. Create flow
  4. Establish pull
  5. Continuously improve

Note that simply accumulating data because it is available, and with the hope that it might be useful later, is entirely contrary to the lean approach. It really shouldn’t be surprising that neglecting to understand consumption scenarios and how to service them, and instead ingesting data onto an analytic platform with the intent to figure out how to use it later, can lead to data inventory (in addition to technology inventory in platform components that are not utilised).

There may be some cases where we can make a compelling case that a particular analytic data set would be unique or otherwise interesting enough to justify maintaining it as inventory. But again, without demonstrated value in consumption, we can make multiple arguments against the utility of data collected speculatively:

  • Historical data may be no longer relevant when it comes to use it, due to drift as the world and people’s behaviours change
  • Speculative data or its metadata may be otherwise unfit for purpose for a given future application, as we can only fully assess data and metadata quality with respect to a particular use case
  • Quality and relevance may be more important than sheer volume of data and it may be possible to collect sufficient data over the lifespan of an application initiative; just in time
  • The use case may be solvable without reference to historical data

So a lean process for data initiatives would aim to do the following:

  1. First understand customer problems or opportunities that could be possible applications for data, analytics and ML/AI, and choose the most valuable to pursue.
  2. Then determine appropriate data source(s) and how to acquire, then move and transform data to a point where it can be served as a valuable product, insight or experience to some consumer.
  3. Ensure that data flows consistently and reliably from source(s) to point(s) of consumption. Build quality in. If there are issues, address the root cause, and repeat. Note that availability of a service is a major value lever and that inconsistency is another form of waste in Lean.
  4. Understand the cycle on which the service needs updating – what latency is acceptable, when does a model need to be retrained, etc – so that value is pulled from the consumer only as needed rather than pushed from the source.
  5. Continuously improve.

This approach may sound very different to how data, analytics, ML & AI initiatives work in your organisation. Much like financial governance monitors these accounting measures, data governance may consider adopting some of these techniques. Though some changes may be easy to achieve and others much harder, to me, this difference shows the potential to eliminate waste in how we work with data, especially in the form of data inventory.