Dave speaking at DataEngBytes

7 wastes of data production – when pipelines become sewers

I recently had the chance to present an updated version of my 7 wastes of data production talk at DataEngBytes Melbourne 2023. I think the talk was stronger this time around and I really appreciated all the great feedback from the audience. Check out the video below and the slides.

Pipes screensaver a very appropriate thumbnail

Thanks to Peter Hanssens and the DEB crew for having me as part of an impressive speaker lineup and for putting on a great event.

For the earlier versions, see the original 7 wastes post and the 2021 LAST Conference version Data mesh: a lean perspective.

Key ideas

There’s a lot of ground to cover in 30 minutes with 7 wastes from both run and build lenses, plus 5 lean principles to address the waste. The single slide below is the best summary.

Lean principles to address wastes

The table below gives a brief description of each waste through each of run and build lens.

WasteRunBuild
OverproductionUnused productsUnused products
InventoryStored or processed data
not used
Development work in progress
OverprocessingCorrecting poor quality dataWorking with untrusted data
TransportationReplication without reproducibilityHandoffs between teams
MotionManual intervention or finishingContext switching
WaitingDelays in taking action on business eventsDelays due to handoffs or feedback lead time
DefectsDefects introduced into data at any pointDefects introduced into processing code
7 wastes of data production in run and build

I’ll leave the summary here and encourage you to watch the video or read the slides if you want to know more.

Postscript: Inventory being a particularly pernicious waste, I have since expanded on in further in Dealing with data inventory


Posted

in

, ,

by