22 rules of generative AI, 2 years on, part 1

How has my original post on 22 rules of generative AI aged in a period of rapid change? Are these solution considerations as enduring as I thought?

Let’s reflect on the original advice and developments in the meantime. Apologies again for minimal references as I timeboxed the writing. In general, evidence should be discoverable/verifiable with a web search.

If you prefer to listen, I’ve covered some of this ground in AI conversations for every role.

Strategy and product management roles

1. Know what input you have to an AI product or feature that’s difficult to replicate. This is generally proprietary data, but it may be an algorithm tuned in-house, access to compute resources, or a particularly responsive deployment process, etc. This separates competitive differentiators from competitive parity features.

Amongst the big players, the story of the last two years has been about building bigger and bigger data sets, and acquiring more and more compute resources. DeepSeek has recently underscored that algorithms tuned-in house to minimise consumption of compute resources can be an advantage too! However, we’re seeing none of the big players seem to have a sustainable moat against the others.

2. Interrogate the role of data. Do you need historical data to start, can you generate what you need through experimentation, or can you leverage your proprietary data with open source data, modelling techniques or SaaS products? Work with you technical leads to understand the multitude of mathematical and ML techniques available to ensure data adds the most value for the least effort.

In my work with many organisations pursuing generative AI opportunities, the conversation has almost always rapidly turned to the availability of differentiating data. Synthetic data is promising in closed domains where the answer is verifiable, and a simple prompt may be enough to prototype interactions, but both of these approaches have their limitations. With regards to using the best available techniques, I’ve frequently seen clever approaches bubble up in a GenAI stone soup.

3. Understand where to use open source or Commercial Off-The-Shelf (COTS) software for parity features [also intended to cover SaaS], but also understand the risks of COTS including roadmaps, [custom] implementations, operations and data.

We’ve seen growth and diversity of providers, with models tuned for different applications, while we have also seen some degree of standardisation in interfaces to LLMs, and related capabilities like RAG and fine-tuning. This combination allows organisations to simplify solutions while remaining flexible. We should remember generative AI capabilities are not one-size-fits all, and substantial integration effort remains in targeted applications and user experiences. More efficient models may also move from the cloud to the edge, impacting operations.

4. Recognise that functional performance of AI features is uncertain at the outset and variable in operation, which creates delivery risk. Address this by: creating a safe experimentation environment, supporting dual discovery (creating knowledge) and development (creating software) tracks with a continuous delivery approach, and – perhaps the hardest part – actually responding to change.

Gary Marcus and others continue to remind us that LLMs have reliability problems, which may be never be solved in this paradigm. Evals are a leading approach to supporting experimentation, and establishing the likely performance and range of behaviours on specific tasks, and are suited to continuous delivery, but may increase test time and inference costs substantially. In any case, it doesn’t seem feasible to take anything other than an iterative approach (and hence invest in your ability to iterate) when developing GenAI solutions.

Design roles

5. Design for failure, and loss of vigilance in the face of rare failures. Failure can mean outputs that are nonsensical, fabricated, incorrect, or – depending on scope and training data – harmful.

We’ve seen plenty of examples of failure: $1 Chevrolets, out-of-policy airline refunds, daily rock ingestion quotas, etc. I’d contend we haven’t made great strides in designing for failure. I think most progress in accommodating failure has been made by users in developing an understanding of how the technology works (either explicitly or intuitively), related to point 6. We’ve seen UX patterns that anthropomorphise generative AI solutions, at best being a little misleading, at worst being counter-productive or risking major failures due to over-reliance on non-existant generalisation or reasoning capabilities. While GenAI chatbots are distinguished from human chat, and now come with warnings about unreliability, there’s still no way for the user to distinguish good from bad responses – “endless right answers” come with fewer, but still endless, wrong answers. Feedback (thumbs up/down) options help signal variable quality to the user. Attributing sources (e.g. through RAG) is helpful, but it’s still possible for the attributed responses to contain generated hallucinations.

tl;dr – we’re all more vigilant because we know GenAI is fallible, but we need to know what’s been created by GenAI.

6. Learn the affordances of AI technologies so you understand how to incorporate them into user experiences, and can effectively communicate their function to your users.

As above, I think anyone who’s serious in this area has had plenty of opportunity to understanding the affordances of GenAI, making them able to deploy empathy for users as a designer. Where products aren’t sympathetic to the affordances, it’s probably a result of perverse incentives. What we need here is another layer of critical thinking to question the appropriateness of GenAI for a particular use case – as below – and the damage it might be doing when deployed in inappropriate cases.

7. Study various emerging UX patterns. My quick take: generative AI may be used as a discrete tool with (considering #5) predictable results for the user, such as replacing the background in a photo, it may be used as a collaborator, reliant on a dialogue or back-and-forth iterative design or search process between the user and AI, such as ChatGPT, or it may be used as an author, producing a nearly finished work that the user then edits to their satisfaction (which comes with risk of subtle undetected errors).

Totally based on vibes, my current take on UX patterns is that experts have shied away from author, recognising the cost of errors and rework, while novices find rapidly getting 80% of the way there intoxicating. Collaborator remains a flexible pattern as it allows for iterative refinement, as a human conversation does, though ongoing wariness about high rates of hallucinations can again reduce collaboration to – favourably – rubber ducking or – unfavourably – Clever Hans. Much like the comparison between the failure of general self-driving and success of narrow driver assistance systems, we’ve seen a degree of adoption amongst a range of discrete tools like “summarise”, “proof read”, “rewrite” that have fairly predictable (but not perfect) results.

8. Consider what role the AI is playing in the collaborator pattern – is it designer, builder, tester, or will the user decide? There is value in generating novel options to explore as a designer, in expediting complex workflows as a builder, and in verifying or validating solutions to some level of fidelity as a tester.

I consider I’ve seen all of these roles implemented. Designer, at least of the first draft, solving the blank page problem, seems to be a productive place to start a new task (but is it eroding our ability to think creatively or critically?) The success of a Builder seems to depend heavily on the design of the underlying tools that will be composed into a workflow. A clearly documented and comprehensive API may work well, but then GenAI provides only minimal value. On the other hand, trying to automate interaction with a simulated display and input devices minimises coding and produces impressive sequences but still requires human intervention in most tasks and creates instant legacy. If we can’t rely on outputs, tester may be best employed as test designer, finding things to test, rather than test evaluator, though LLM-as-judge is a pattern in evals where it’s otherwise hard to define a test.

9. Design for explainability, to help users understand how their actions influence the output. (This overlaps heavily with #6)

Packaging GenAI as discrete tools helps connect actions to outputs. RAG attributing sources helps explain responses. Otherwise explainability is hard. “Reasoning” traces purport to show “thinking” steps but remain simulations of what someone thinking might say, rather than a reflection of a thinking process.

10. More and more stakeholders will want to know what goes into their AI products. If you haven’t already, start on your labelling scheme for AI features, which may include: intended use, data ingredients and production process, warnings, reporting process, and so on, with reference to risk and governance below.

Individuals and some organisations are choosing not to use GenAI due to a range of concerns about what goes into it, and also encouraging others to look critically at their use. Creative workers in particular are concerned about appropriation of their work without compensation, and numerous legal proceedings concerning copyright are ongoing. Simultaneously, lawyers are learning and relearning the danger of relying on GenAI to discover (or, failing that, hallucinate) precedent. Another group with concerns are educational workers, who have questioned the appropriateness of GenAI in educational settings, due to instances of inventing facts, reducing critical thinking, and perpetuating bias. Many people are choosing not to use or advocating against the use of GenAI due to its present and anticipated future environmental impacts, for marginal benefit in many applications. Many jurisdictions have introduced AI safety and governance frameworks (more in the governance section) that specify controls and checkpoints related to concerns over what goes into an AI and how that AI is used.

In general, the more you can do in 2025 to align to stakeholder interests and assure your AI supply chain, the better.

Further roles

The original post also covered Data Science and Engineering, Architecutre and Software Engineering roles, and Risk and Governance Roles. I’ll come back to those in future posts (I hope), as this is already running long. I’m very aware these observations would still benefit from comprehensive references for readers other than myself, something else I’ll try to come back to!

For conciseness, I’ve also distilled just 4 rules, based on a framework that I’ve been tweaking since Continuous Intelligence in 2018, though this doesn’t include the multiple disciplines perspective I’ve used here, which has also evolved since introduced in Reasoning About Machine Intuition in 2017.