Dave speaking at DataEngBytes

7 wastes of data production – when pipelines become sewers

I recently had the chance to present an updated version of my 7 wastes of data production talk at DataEngBytes Melbourne 2023. I think the talk was stronger this time around and I really appreciated all the great feedback from the audience.

Thanks to Peter Hanssens and the DEB crew for having me as part of an impressive speaker lineup and for putting on a great event.

Check out the video below and the slides.

Pipes screensaver a very appropriate thumbnail

For the earlier versions, see the original 7 wastes post and the 2021 LAST Conference version Data mesh: a lean perspective.

Outline

There’s a lot of ground to cover in 30 minutes with 7 wastes from both run and build lenses, plus 5 lean principles to address the waste. I’ll leave the summary here and encourage you to watch the video or read the slides if you want to know more.

WasteRunBuild
OverproductionUnused productsUnused products
InventoryStored or processed data
not used
Development work in progress
OverprocessingCorrecting poor quality dataWorking with untrusted data
TransportationReplication without reproducibilityHandoffs between teams
MotionManual intervention or finishingContext switching
WaitingDelays in taking action on business eventsDelays due to handoffs or feedback lead time
DefectsDefects introduced into data at any pointDefects introduced into processing code
7 wastes of data production in run and build