Dealing with data inventory

Data held by businesses is often described as an asset, but there are cases where this can be misleading or even incorrect. In any case, data managed inappropriately leaves value on the table, inflates cost, reduces responsiveness, and creates risk.

Some data held by businesses would better be described as inventory. It might one day be a true asset, but equally that might never come to pass. If it’s better described as inventory, how then should we deal with it?

Diagram summarising this post. Assets may be inventory, in which case it's particularly important to examine the potential liabilities, then use lean practice to ensure data consumption, creating assets rather than inventory — Dealing with data inventory overview

Valuing data inventory

While it’s a very 21st century aspiration to be driven by data, the term “asset” invokes an accounting tradition founded in 8th century Persia. Indeed, accounting principles help reveal the truth about the value of data held by organisations.

An asset is a resource that can be used to produce positive economic value.

Inventory is goods and materials that a business holds for the ultimate goal of resale, production or utilisation. You might think this fits the definition of asset, and you would be right. Along with cash, etc, inventory is accounted as one type of current asset.

Rows of varieties coffee beans in a store

While inventory is an asset, however, inventory is valued more conservatively than other assets. Inventory is valued at the lesser of cost and net realisable value. Given the high complexity of realising value from data, we might conservatively consider the (lesser valuation) net realisable value of data inventory as 0.

There may be many scenarios–where the opportunity is demonstrably large, or the path to deploying data solutions is clear-in which you can argue for a higher valuation for data inventory, but 0 is a benchmark valuation to keep in mind.

So how you value your data will depend on whether you consider it as an asset or inventory, and we need some sort of test to discriminate.

The test for data inventory

Now we come to the test for data asset vs data inventory, and its implications.

Following the accounting principle of conservativism, I would argue that data is only an asset if it is actively consumed in a business process, customer experience, etc. We cannot consider data an asset if it fails to demonstrate that it can be used to produce positive economic value.

If the data is instead held for some future purpose in some state of processing, but not yet consumed by an end user, then it is inventory.

Data inventory test diagram — Data inventory test

The implication is that the valuation of data inventory will be as low as 0, much less than data assets, which are valued according to economic returns.

Data liabilities

Direct cost

Acquiring, processing and holding analytic data comes at a direct cost. The cost of licensing, compute, storage and network is part of this, but the cost of people to analyse and govern data, and to implement solutions that extract, transform, load, secure, catalogue and monitor analytic data (above and beyond operational data), may well be greater. These costs, or liabilities, apply to data assets and inventory. Actively consumed data assets further incur a cost to serve.

Potential liabilities and impairment

Holding data also represents additional potential liabilities – such as fines or remediation cost – or impairment of other assets – such as brand – in the case of a data breach. The likelihood, impact and remediation effort of a data breach all become greater the more data in volume and variety you hold, and arguably provision should be made for these potential liabilities if we are accounting for data.

Opportunity cost

Finally, inventory creates opportunity cost in reduced responsiveness and throughput in delivering business value. This is also impacts your people! While I mentioned people as a direct cost above, the bigger issue–for your business and your people–is if people are employed unproductively and without purpose on simply managing data inventory.

While data assets justify these costs with the value delivered, data inventory carry almost all the same substantial liabilities as data assets, but deliver none of the value. So how do we build more data assets with less data inventory?

In identifying inventory as a major form of waste, Lean practice complements the accounting perspective, and also provides actionable guidance on reducing inventory and other wastes to improve value delivery.

Leaning into data inventory

Lean practice gives us actionable guidance on reducing inventory and other wastes, as I described in 7 wastes of data production – when pipelines become sewers.

The key steps to deliver value with minimal waste in Lean are:

Define value [for a customer]
Map the value stream
Create flow
Establish pull
Continuously improve

Lean principles and process diagram to address waste: define value, map the value stream, create flow, establish pull, continuously improve

Note that simply accumulating data because it is available, and with the hope that it might be useful later, is entirely contrary to the lean approach. It really shouldn’t be surprising that neglecting to understand consumption scenarios and how to service them, and instead ingesting data onto an analytic platform with the intent to figure out how to use it later, can lead to data inventory, in addition to technology inventory (as platform components that are not utilised).

Maybe we can make a compelling case that a particular analytic data set would be unique or otherwise interesting enough to justify maintaining it as inventory. But again, without demonstrated value in consumption, we can make multiple arguments against the utility of data collected speculatively:

Historical data may be no longer relevant when it comes to use it, due to drift as the world and people’s behaviours change
Speculative data or its metadata may be otherwise unfit for purpose for a given future application, as we can only fully assess data and metadata quality with respect to a particular use case
Quality and relevance may be more important than sheer volume of data and it may even be possible to collect sufficient data over the lifespan of an initiative; just in time
The use case may be solvable without reference to historical data

Lean data

So a lean process for data initiatives would aim to do the following:

First understand customer problems or opportunities that could be possible applications for data, analytics and ML/AI, and choose the most valuable to pursue.
Then determine appropriate data source(s) and how to acquire, then move and transform data to a point where it can be served as a valuable product, insight or experience to some consumer.
Ensure that data flows consistently and reliably from source(s) to point(s) of consumption. Build quality in. If there are issues, address the root cause, and repeat. Note that availability of a service is a major value lever and that inconsistency is another form of waste in Lean.
Understand the cycle on which the service needs updating – what latency is acceptable, when does a model need to be retrained, etc – so that value is pulled from the consumer only as needed rather than pushed from the source.
Continuously improve.

This approach may sound very different to how data, analytics, ML & AI initiatives work in your organisation.

If you have existing data that doesn’t align to your current and future applications, then that looks like inventory that is more likely to be a liability than an asset, and you should consider its careful disposal.

Much like financial governance monitors these accounting measures, data governance should consider adopting some of these techniques. This difference shows the potential to eliminate waste in how we work with data, especially in the form of data inventory.