GenAI stone soup

GenAI (typically as an LLM) is pretty amazing, and you can use it to help with tasks or rapidly build all kinds of things that previously weren’t feasible.

Things that work some of the time.

The soup

But do you find yourself reworking large chunks of generated content, or face major hurdles in getting a prototype to production?

In the case of taking your LLM prototype to production, you need it to work most, if not all of the time. You soon realise you must:

Ensure that any data used is properly prepared, correct and current with appropriate controls
Consistently evaluate responses and manage regressions (with quality labelled data)
Add some guardrails to prevent unexpected or adversarial inputs
Limit the outputs to a set of safe, useful and desirable choices
Re-imagine the UI to better suit the more constrained interaction
Improve latency and cost of the back-end to run at scale (maybe with a different ML technique)
etc

And all of a sudden, the LLM is surrounded by a lot of supporting infrastructure, and maybe the LLM is not doing much. In some solutions (such as RAG) it’s a mix of LLM and support, but data preparation and evaluation remain crucial. In extreme scenarios, you might reduce to traditional solution like a setup wizard, or a semantic recommender feeding search, or a classifier matching content or triggering defined workflows, and so on, in which the LLM may play a small part, or no part at all.

These scenarios occur because LLMs are fundamentally unreliable. They do amazing things that weren’t possibly with previous approaches but the cost of failure can be high, and we need to be able to predictably make small changes in production solutions. We anticipate we’ll get better at avoiding, detecting, mitigating and recovering from failures over time, but the fundamentals remain.

The stone

So we have a paradox in that the technology that triggered this feature or helped get started on a task may not be sufficient alone to implement the feature or complete the task.

In these scenarios, that doesn’t mean the LLM isn’t useful, it’s just that we should understand its utility more precisely.

The parable

In the parable of Stone Soup, a hungry traveller arrives at a village with only a stone in their knapsack (or a variant of this). Undeterred, they go to the first house proclaiming they have an amazing recipe for stone soup, if only they could borrow an onion. The villager obliges. At the next house, the traveller does the same, but asks for carrots, then potatoes, and so on. Eventually, they have enough for a hearty soup that feeds the whole village – all made from just a stone!

The lesson

In these scenarios, an LLM remains useful in the way the stone in the parable of Stone Soup is useful, as a catalyst for innovation.

The primary moral of Stone Soup also is relevant, being that each person contributes what they can to create something great for everyone. In this respect, all complex software solutions are built by teams bringing together many simple parts. Also durably valuable is the discipline you might bring to curating your organisation’s unique data. With better data management and governance, and more responsive delivery, you might get beyond opportunity soup to the main course (and on to nuts).

So don’t despair if your “GenAI” feature contains no LLM, it still played a useful roll! [sic]

Actual footage of LLM engineering from 1962 – couldn’t resist! (courtesy Horizon Book of Science)

Postres

Since writing this post (after talking about it for 12 months!), I was tickled to learn that two prominent AI researchers use this metaphor too. The first is Alison Gopnik in regards to the LLM training process, on Berkley Simons Institute News and the Santa Fe Institute Complexity podcast. The second is Subbarao Kambhampati in regards to augmenting LLMs for reasoning tasks, on LinkedIn.

Posted

September 10, 2024

Article, Change, LLM, Machine Learning, Products

safety