I was thrilled to help kick-off the GenAI Network Melbourne meetup at their first meeting recently. I presented a talk titled Semantic hide and seek – a gentle introduction to embeddings, based on my experiments with Semantle, other representation learning, and some discussion of what it means to use Generative AI in developing new products and services. It was a pleasure to present alongside Rajesh Vasa from A2I2 at Deakin University.
Thanks to Ned, Orian, Scott, Alex, Leonard & co for organising. Looking forward to more fun events in this series!
Thinking about adopting, incorporating or building generative AI products? Here are some things to think about, depending on your role or roles.
I assume you’re bringing your own application(s) based on an understanding of an opportunity or problems for customers. These rules therefore focus on the solution space.
Solutions with generative AI typically involve creating, combining or transforming some kind of digital content. Digital content may mean text, code, images, sound, video, 3D, etc, for digital consumption, or it may mean digitized designs for real world products or services such as code (again), recipes, instructions, CAD blueprints, etc. Some of this may also be relevant for how you use other people’s generative AI tools in your own work.
Strategy and product management roles
1. Know what input you have to an AI product or feature that’s difficult to replicate. This is generally proprietary data, but it may be an algorithm tuned in-house, access to compute resources, or a particularly responsive deployment process, etc. This separates competitive differentiators from competitive parity features.
2. Interrogate the role of data. Do you need historical data to start, can you generate what you need through experimentation, or can you leverage your proprietary data with open source data, modelling techniques or SaaS products? Work with you technical leads to understand the multitude of mathematical and ML techniques available to ensure data adds the most value for the least effort.
3. Understand where to use open source or Commercial Off-The-Shelf (COTS) software for parity features, but also understand the risks of COTS including roadmaps, implementation, operations and data.
4. Recognise that functional performance of AI features is uncertain at the outset and variable in operation, which creates delivery risk. Address this by: creating a safe experimentation environment, supporting dual discovery (creating knowledge) and development (creating software) tracks with a continuous delivery approach, and – perhaps the hardest part – actually responding to change.
5. Design for failure, and loss of vigilance in the face of rare failures. Failure can mean outputs that are nonsensical, fabricated, incorrect, or – depending on scope and training data – harmful.
6. Learn the affordances of AI technologies so you understand how to incorporate them into user experiences, and can effectively communicate their function to your users.
7. Study various emerging UX patterns. My quick take: generative AI may be used as a discrete tool with (considering #5) predictable results for the user, such as replacing the background in a photo, it may be used as a collaborator, reliant on a dialogue or back-and-forth iterative design or search process between the user and AI, such as ChatGPT, or it may be used as an author, producing a nearly finished work that the user then edits to their satisfaction (which comes with risk of subtle undetected errors).
8. Consider what role the AI is playing in the collaborator pattern – is it designer, builder, tester, or will the user decide? There is value in generating novel options to explore as a designer, in expediting complex workflows as a builder, and in verifying or validating solutions to some level of fidelity as a tester. However, for testing, remember you can not inspect quality into a product, and consider building in quality from the start.
9. Design for explainability, to help users understand how their actions influence the output. (This overlaps heavily with #6)
10. More and more stakeholders will want to know what goes into their AI products. If you haven’t already, start on your labelling scheme for AI features, which may include: intended use, data ingredients and production process, warnings, reporting process, and so on, with reference to risk and governance below.
Data science and data engineering roles
11. Work in short cycles in multidisciplinary product teams to address end-to-end delivery risks.
12. Quantify the functional performance of systems, the satisfaction of guardrails, and confidence in these measures for to support product decisions.
13. Make it technically easy and safe to work with and combine rich data.
14. Implement and automate a data governance model that enables delivery of data products and AI features to the support business strategy (i.e., a governance model that captures the concerns of other rules and stakeholders here).
Architecture and software engineering roles
15. Understand that each AI solution is narrow, but composable with other digital services. In this respect, treat each AI solution as a distinct service until a compelling case is made for consolidation. (Note that, as above, product management should be aware of how to make use of existing solutions.)
16. Consolidate AI platform services at the right level of abstraction. The implementation of AI services may be somewhat consistent, or it may be completely idiosyncratic depending on the solution requirements and available techniques. The right level of abstraction may be emergent and big up-front design may be risky.
17. Use continuous delivery for short feedback cycles and delivery that is both iterative – to reduce risk from knowledge gaps – and responsive – to reduce the risk of a changing world.
18. Continuous delivery necessitates a robust testing and monitoring strategy. Make use of test pyramids for both code and data for economical and timely quality assurance.
Risk and governance roles
19. Privacy and data security are the foundation on which everything else is built.
20. Generative AI solutions, like other AI solutions, may also perpetuate harmful content, biases or correlations in their historical training data.
21. Understand that current generative AI solutions may be subject to some or all of the following legal and ethical issues, depending on their source data, training and deployment as a service: privacy, copyright or other violation regarding collection of training data, outputs that plagiarise or create “digital forgeries” of training data, whether the aggregation and intermediation of individual creators at scale is monopoly behaviour and whether original creators should be compensated, that training data may include harmful content (which may be replicated into harmful outputs), that people may have been exposed to harmful content in a moderation process, and that storing data and the compute for training and inference may have substantial environmental costs.
22. Develop strategies to address the further structural failure modes of AI solutions, such as: misalignment with user goals, deployment into ethically unsound applications, the issue of illusory progress where small gains may look promising but never cross the required threshold, the magnification of rare failures at scale and the resolution of any liability for those failures.
These are the type of role-based considerations I alluded to in Reasoning About Machine Creativity. The list is far from complete, and the reader would doubtless benefit from sources and references! I intended to write this post in one shot, which I did in 90 minutes while hitting the target 22 rules without significant editing, so I will return after some reflection. Let me know if these considerations are helpful in your roles.
The metaverse is a topic currently, though the concept has a long history. Twenty years ago, in the dotcom era, I was exploring this space, as I was recently reminded. Feeling nostalgic, I dug these projects out of the NAS archives. Tech has moved on, but there’s enduring relevance in what I learned.
Virtual Online Orienteering, to the max! Conceived as similar to Catching Features, but a game that was online by default, as you could play in the browser in a virtual world authored in VRML97.
The UI consisted of two browser windows: a first person view and motion controls (using the now defunct Cosmo Player), and a map in a second window.
The draggable compass needle, the checkpoints and the course logic (must visit checkpoints in order), and a widget that visualised your completed route as an electric blue string hovering 3 feet above the ground were all modular VRML Protos.
The map and terrain for the only level ever created were generated together with a custom C++ application. I was pretty pleased this all worked, and it demonstrated some concepts for…
4DUniverse was a broad concept for virtual online worlds for socialising, shopping, gaming, etc, similar at the time to ActiveWorlds (which still exists today!), but again accessible through the browser (assuming you had a VRML plugin).
I thought I’d have great screengrabs to illustrate this part of the story, but I was surprised how few I’d captured, that they were very low resolution, and that they were in archaic formats. The source artefacts from this post – WRL, HTML, JS, JAVA, etc – have lost no fidelity, but would only meet modern standards and interpreters to various degrees. Maybe I will modernise them someday and generate new images to do justice to the splendour I held in my mind!
We authored a number of worlds, connected by teleports, with the tools we had to hand, being text editors, spreadsheets, and custom scripts. While a lot of fun, we came to the conclusion that doing the things we envisaged in the 4DUniverse wasn’t any more compelling than doing them in the 2D interfaces of the time. VRML eventually went away, and probably because no one else was able to make a compelling case for its use. At least I crafted a neat animated GIF logo rendered with POV-Ray.
Less multiverse, and more quantum realm, I also generated VRML content at nanoscale with…
NanoCAD was a neat little (pun intended) CAD application for designing molecules, which I extended with a richer editing UI, supporting the design of much more complex hypothesised molecular mechanisms.
The Java app allowed users to place atoms in 3D and connect them with covalent bonds. Then an energy solver would attempt to find a stable configuration for the molecules (using classical rather than quantum methods). With expressive selection, duplication and transformation mechanics, it was possible to create benzene rings, stitch them into graphene sheets, and roll them up into “enormous” buckytubes, or other complex carbon creations.
I also created cables housed inside sheaths, gears – built with benzene ring teeth attached to buckytubes – and other micro devices. If 4DUniverse was inspired by Snow Crash, NanoCAD was inspired by The Diamond Age. Nanocad could run in a browser as an applet and the molecules could also be exported as WRL files for display in other viewers.
Comparing contemporary professional projects
It’s nice to contrast the impermanence of these personal projects with the durability of my contemporary professional work with ANCA Machines. At the time, I was documenting the maths and code of Cimulator3D and also developing the maths and bidirectional UI design for iFlute, both used in the design and manufacturing of machine tools via grinding processes. Both products are still on the market in similar form today, more than two decades later.
I wonder how I’ll view this post in another twenty years?
Picking up threads from previousposts on solving Semantle word puzzles with machine learning, we’re ready to explore how different solvers might play along with people while playing the game online. Maybe you’d like to play speed Semantle against an artificially intelligent opponent, maybe you’d like a left-of-field hint on a tricky puzzle, or maybe it’s just fun to spectate at a cerebral robot battle.
The solvers have a view of how words relate due to a similarity model that is encapsulated for ease of change. To date, we’ve used the same model as live Semantle, which is word2vec. But as this might be considered cheating, we can now also use a model based on the Universal Sentence Encoder (USE), to explore how the solvers perform with separated semantics.
To recap, the key elements of the solver ecosystem are now:
make_guess() – return a guess that is based on the solver’s current state, but don’t change the solver’s state,
merge_guess(guess, score) – update the solver’s state with information about a guess and a score,
Scoring of guesses by either the simulator or a Semantle game, where a game could also include guesses from other players.
It’s a simplified reinforcement learning setup. Different combinations of these elements allow us to explore different scenarios.
Let’s look at how solvers might play with people. The base scenario friends is the actual history of a game played with people, completed in 109 guesses.
Solvers could complete a puzzle from an initial sequence of guesses from friends. Both solvers in this particular configuration generally easily better the friends result when primed with the first 10 friend guesses.
Solvers could instead make the next guess only, but based on the game history up to that point. Both solvers may permit a finish in slightly fewer guesses. The conclusion is that these solvers are good for hints, especially if they are followed!
Maybe these solvers using word2vec similarity do have an unfair advantage though – how do they perform with a different similarity model? Using USE instead, I expected the cohort solver to be more robust than the gradient solver…
… but it seems that the gradient descent solver is more robust to a disparate similarity model, as one example of the completion scenario shows.
The gradient solver also generally offers some benefit making a suggestion for just the next guess, but the cohort solver’s contribution is marginal at best.
These are of course only single instances of each scenario, and there is significant variation between runs. It’s been interesting to see this play out interactively, but a more comprehensive performance characterisation – with plenty of scope for understanding the influence of hyperparameters – may be in order.
The solvers can also play part or whole games solo (or with other players) in a live environment, using Selenium WebDriver to submit guesses and collect scores. The leading animation above is gradient-USE and a below is a faster game using cohort-word2vec.
And that’s it for now! We have multiple solver configurations that can play online by themselves or with other people. They demonstrate how people and machines can collaborate to each bring their own strengths to solving problems; people with creative strategies and machines with a relentless ability to crunch through possibilities. They don’t spoil the fun of solving Semantle yourself or with friends, but they do provide new ways to play and to gain insight into how to improve your own game.
Postscript: seeing in space
Through all this I’ve considered various 3D visualisations of search through a semantic space with hundreds of dimensions. I’ve settled on the version below, illustrating a search for target “habitat” from first guess “megawatt”.
This visualisation format uses cylindrical coordinates, broken out in the figure below. The cylinder (x) axis is the projection of each guess to the line that connects the first guess to the target word. The cylindrical radius is the distance of each guess in embedding space from its projection on this line (cosine similarity seemed smoother than Euclidian distance here). The angle of rotation in cylindrical coordinates (theta) is the cumulative angle between the directions connecting guess n-1 to n and n to n+1. The result is an irregular helix expanding then contracting, all while twisting around the axis from first to lass guess.
It was great to be able to reflect with colleagues on common themes running through Thoughtworks’ work in languages and technology. In various scenarios, with different technology approaches, we worked to improve the inclusiveness of solutions, pointing to a more linguistically inclusive future.
I was excited to read Rebooting AI (website), to find inspiration and tools for doing things better. Here is the book in one great quote:
For now, we are in a kind of interregnum: narrow but networked intelligences with autonomy, but too little genuine intelligence to be able to reason about the consequences of that power.
There is a lot to like. Marcus & Davis clearly map out the history and landscape of AI challenges, plus plausible elements of future solutions. They provide useful tools for thinking about problems with partial solutions to intelligence, such as the “fundamental over attribution error” and the “illusory progress gap”. They show how current ML solutions based on big data can be opaque and brittle. They demonstrate how key attributes of human intelligence instead allow the development of rich cognitive models – such as how language and the real world work – and how solutions incorporating such models would address current shortcomings, enabling AI to tackle open-ended tasks. This is great material for a general reader.
Where I felt the book fell short was that it didn’t build many bridges between our current “narrow but networked intelligences” and the authors’ posited future state capabilities. The future state reads like Artificial General Intelligence (AGI) by another name, fleshed out by scenarios that are short on implementation detail. Though sometimes mundane, from our current perspective, Arthur C Clarke might describe as them “indistinguishable from magic” and hence Rodney Brooks would say they are “no longer falsifiable”. We know there’s a massive chasm between current ML solutions and AGI, but I didn’t find much to close or bridge the chasm in Rebooting AI.
Some of these future capabilities are illustrated by domain-specific modelling techniques – like formal logic – that would be familiar to many computer science students. But I found this a little incongruous because these techniques have also failed to deliver on promises of realising intelligence, and not done any more to squash the “long tail of edge cases” than other narrow intelligences. Given the diverse facets of intelligence, maybe the paradigm of “narrow but networked intelligences” is the best way to achieve or approximate intelligence, or maybe it’s ultimately illusory progress, but these illustrations didn’t help me resolve that.
There is undeniable value in the current generation of ML solutions. How do we build on these? A detailed analysis of a number of key avenues of short to medium term progress was lacking. For instance, starting with current ML solutions, the authors could have explored:
various designs of hybrid human-machine decision-making systems that augment human abilities while remaining resilient to new scenarios that stump machines;
transfer learning, few-shot learning and sophisticated representation learning like transformers, that have potential to increase the representative and reasoning power of solutions;
the role of ecosystem design and governance, including ongoing monitoring and data curation to correct issues (for instance bias testing, CD4ML, etc).
Instead, ML was stereotyped as fully automated, tabula rasa, E2E.
Finally, to know things are getting better, we need the right baseline and measures. While the language examples clearly demonstrated superficial artificial understanding, and self-driving vehicles have a ways to go, some issues raised were not assessed against incumbent human capabilities on narrow tasks in a like-for-like comparison, but rather against posited capabilities of a future AGI system. I would agree that humans can individually reflect and introspect to recognise their mistakes, but it is still the case that, in operational scenarios, humans make mistakes like artificial systems do. These operational mistakes are moderated by the wider ecosystem in which humans operate, in the same way as predictive inference mode is moderated by a wider human-machine ecosystem. I felt the core issue in some instances was structurally unsound or concentrated decision-making without proper governance, rather than whether or not mistakes were made, and this confounded the analysis. I would have liked to have seen these factors teased out so comparisons could be made in a way that would help to measure progress.
Marcus & Davis do lay out a helpful framework for building trust in AI systems, including stress testing, understanding costs of failures, building in modularity and maintainability, etc. This is good guidance but it would be really helpful to see more detail or case studies under these headlines, to the specificity of other works like Weapons of Math Destruction and Made by Humans.
So, maybe I was hoping for “Refactoring AI” rather than “Rebooting AI”. The book certainly clearly describes problems with the current state, and desirable characteristics of the future state. On balance, the technical arguments may indeed be sounder than my concerns. If you’re curious, I would encourage you to read it and draw your own conclusions. Ultimately, however, I’m disappointed because I didn’t leave inspired and equipped with new insight and new tools for improving AI today, tomorrow, and the day after.
The path to good design is bumpy, as we will demonstrate with four teapots. (Yes, teapots. Teapots are a staple of computer science and philosophy.)
The path to good design matters, because if you are trying to build a design capability, the journey will be smoother if you understand that the path is bumpy.
Leaders who appreciate the bumpy path can facilitate far greater value creation and support a more engaged group of workers.
What is design?
Design is an activity, but also a result: the specification for a product (service), which determines how it is made or delivered.
Performance is a measure of how a product actually functions, for a given task in a given context. Performance in the broadest sense includes emotional responses, static and dynamic physical characteristics, service characteristics, etc. For simplicity, let’s measure performance in monetary terms; eg. lifetime economic value.
Design is important as an activity and a result, because it is the prime determinant of performance that is within your control.
The smooth path
Consider the distinctive teapot from the cover of Don Norman’s Design of Everyday Things, where the handle – instead of opposing – is aligned with the spout.
We know a thing or two about teapots, so we assume this design has very poor performance!
However, we also assume that a traditional design with handle opposed to the spout produces the best performance.
We can plot our smooth model of how performance varies as a function of the angle between spout and handle.
And it’s pretty clear how to find the best design. The more opposing the handle and spout, the better the performance, the more value created, and hence the better the design.
The first bump in the path
However, this model is broken. We can’t interpolate smoothly (linearly) between design points, as demonstrated by the Japanese yokode kyusu, which features a handle at right angles to its spout, to extract every last drop of tea.
With this new insight, and a further assumption that handles in between the points we’ve plotted (eg, 45 degrees) are much worse due to awkward twisting motions when pouring, we can draw a new model, which is already much less smooth.
What’s interesting about this landscape is that most design variants perform pretty poorly, and you must be close to a good design to find it. If you didn’t have the insight into teapot performance that we have assumed – if you had only tested performance at the awkward angles, and you had assumed smooth behaviour in between – you would likely miss the best designs and leave significant value on the table. (Note that the scale of this diagram should be greatly exaggerated to demonstrate the true size of value creation opportunities.)
So, this is the first lesson of the bumpy path to good design. We need to explore the performance of multiple design variants, and understand that small changes in design can have enormous impacts on performance, to be confident we are approaching our potential to create value.
So far, we have only explored the impact of one design variable, but for any product we have effectively infinitely many design variables (if we can just conceive them). For instance, the handle of a teapot could also be on top, but we could also consider the shape, material, fixtures, etc. Then we could move beyond the handle to the design of the rest of the teapot!
Now consider the design and delivery of digital products and services. Constraints do exist, but infinite design variants still exist within those constraints. Further, like the rolled up dimensions of string theory, there are extra dimensions of design that are easy to miss, but once discovered can be expanded and explored to create ever more value.
The first lesson
How do leaders get this wrong? By failing to encourage the exploration of a sufficient number of design variants, and by failing to encourage the exploration of minor changes that have outsize impact.
As a leader, you must be prepared to carve out time and space, embrace uncertainty and ambiguity, and bring creativity, compassion and patience to the exploration process. As important as this is to creating value, it is also key to maintaining the engagement of teams involved in or interacting with design.
I’m often told that exploration feels inefficient. Or, rather, felt inefficient. The distinction is importation. Hindsight bias distorts the reality that before starting an exploration into a sufficiently bumpy landscape, we simply cannot know what we will find. So how do we measure efficiency of exploration? Certainly not by how quickly we arrive at a design, or by how many designs are discarded. Should we even measure efficiency of exploration? That is a better question. We should focus on net value creation, and do enough exploration to mitigate the risk that we are leaving significant value on the table.
This design sensibility, however, may not be apparent to the whole team. Designers will be frustrated being managed to a smooth path, while others who perceive the challenge to be simple may become frustrated when the bumpiness is allowed to surface. The team’s various activities may have different cadences that sometimes align, and sometimes don’t. This can create friction and dissatisfaction in teams. Some functional conflict is healthy in this regard, but as a leader, you must support and enable a team to focus on what it takes to create value.
The second bump in the path
I have used word “assume” liberally and deliberately above. I have assumed a large number of things about the tasks that users of the teapots are seeking to achieve, and the broader contexts of use. I have further assumed that my readers share a traditional western notion of teapots and their use. I have done this to keep simple – I hope – the explanation of the first bump.
But “assume” is at the root of the second bump. During product development, we can’t assume performance, we must test designs with users engaged a task in a context. We may take shortcuts by prototyping, simulating, etc, but we must test as objectively as possible, for a meaningful prediction of a product’s performance, and potential to create value.
In a bumpy design landscape, poor predictions of actual performance carry significant opportunity cost.
(Note also that during the development of a typical digital product/service, we are typically iteratively discovering the task and the context in parallel.)
We assumed, with our teapots above, that a spout aligned with the handle would lead to poor performance, but we didn’t test it (with a minor tweak in a hidden dimension). If we’d tested this traditional oriental design (as UX Designer Mike Eng did), we would have discovered that, for the task of serving oneself, in a solitary context, the aligned handle actually produces superior performance.
I was surprised to find this teapot design existed when I stumbled upon the post from above. I suspect this teapot design has a specific name or an interesting story behind it, but I haven’t been able to track it down. However, it serves as an excellent demonstration that the best design paths are bumpy.
The second lesson
The second lesson is that assumptions about performance, task and context hide the inherent bumpiness in design. As a leader, you must recognise and challenge assumptions, encourage the testing of designs under the correct conditions, and appreciate that our understanding of task and context may evolve with testing.
There are many resources that discuss lightweight and effective approaches to UX research and testing; you could do worse than to start here.
We have discussed two major value creation activities in design:
Exploration and consequent discovery of performant designs
Testing and consequent selection of more performant designs
But these activities are overlooked or de-prioritised with a smooth mindset. While there is uncertainty, ambiguity and friction along the path, and sometimes progress is difficult to discern, as a leader, you must embrace the bumps because – if you are in the business of creating value – there is no smooth path to good design.
It’s a self-guided audio tour of historic sites in Broome, Western Australia, including beautiful stories told by locals. Nyamba Buru Yawuru developed the concept, curated the media, engaged local stakeholders, and were product owners for the app.
This work was exciting for its value to the Broome and Yawuru community, but also because it was an opportunity to innovate under the constraint of building the simplest thing possible. The simplest thing possible was in stark contrast to the technical whizbangery (though lean delivery) of my previous app project – Fireballs in the Sky.
I had fun working on the interaction and visual design challenges under the constraints, and I think the key successes were:
Simplifying presentation of the real-world and in-app navigation as a hand-rolled map (drawn in Inkscape), showing all the sites, that scrolls in a single direction.
Hiding everything unnecessary during playback of stories, to allow the user to focus on the place and the story.
Playback control behaviour across sites and the main map.
Not succumbing to the temptation to add geo-location, background audio, or anything else that could have added to the complexity!
Stop testing on humans! Auto manufacturers have greatly reduced the harm once caused by inadvertently crash-testing production cars with real people. Now, simulation ensures every new car endures thousands of virtual crashes before even a dummy sets foot inside. Can we do the same for software product delivery?
Simulation can deliver faster feedback than real-world trials, for less cost. Simulation supports agility, improves quality and shortens development cycles. Designers and manufacturers of physical products found this out a long time ago. By contrast, in Agile software development, we aim to ship small increments of real software to real people and use their feedback to guide product development. But what if that’s not possible? (And can we still benefit from simulation even when it is?)
The goal of trials remains the same: get a good product to market as quickly as possible (or pivot or kill a bad product as quickly as possible). However, if you have to wait for access to human subjects or real software, or if it’s too costly to scale to the breadth and depth of real-world trials required to optimise design and minimise risk, consider simulation.
Learn why simulation was chosen for the design of call centre services (and compare this with crash testing cars), how a simulator was developed, and what benefits the approach brought. You’ll leave equipped to decide whether simulation is appropriate for your next innovation project, and with some resources to get you started.
How and when to use simulation to improve agility
The anatomy of a simulator
A lean, risk-based approach to developing and validating a simulator
Techniques for effectively visualising and communicating simulations