Throwback Thursday

The metaverse is a topic currently, though the concept has a long history. Twenty years ago, in the dotcom era, I was exploring this space, as I was recently reminded. Feeling nostalgic, I dug these projects out of the NAS archives. Tech has moved on, but there’s enduring relevance in what I learned.

VO2max (1999)

Virtual Online Orienteering, to the max! Conceived as similar to Catching Features, but a game that was online by default, as you could play in the browser in a virtual world authored in VRML97.

A screenshot of a virtual orienteering game showing a first person view of terrain, a start banner and a compass, with a list of checkpoints underneath.

The UI consisted of two browser windows: a first person view and motion controls (using the now defunct Cosmo Player), and a map in a second window. 

A screenshot of a virtual orienteering game showing a first person view of terrain, a boulder with a checkpoint next to it, and a compass, with a list of checkpoints underneath.

The draggable compass needle, the checkpoints and the course logic (must visit checkpoints in order), and a widget that visualised your completed route as an electric blue string hovering 3 feet above the ground were all modular VRML Protos.

A screenshot of a virtual orienteering game showing a first person view of terrain, the time to complete the course, and a blue line that shows the route taken to complete the course.

The map and terrain for the only level ever created were generated together with a custom C++ application. I was pretty pleased this all worked, and it demonstrated some concepts for…

4DUniverse (2000)

4DUniverse was a broad concept for virtual online worlds for socialising, shopping, gaming, etc, similar at the time to ActiveWorlds (which still exists today!), but again accessible through the browser (assuming you had a VRML plugin).

I thought I’d have great screengrabs to illustrate this part of the story, but I was surprised how few I’d captured, that they were very low resolution, and that they were in archaic formats. The source artefacts from this post – WRL, HTML, JS, JAVA, etc – have lost no fidelity, but would only meet modern standards and interpreters to various degrees. Maybe I will modernise them someday and generate new images to do justice to the splendour I held in my mind!

A variety of screengrabs of virtual worlds from the 4DUniverse. Three different worlds with different themes are shown. Vaguely Greek open hotel lobby, Gothic cathedral in orbit, etc

We authored a number of worlds, connected by teleports, with the tools we had to hand, being text editors, spreadsheets, and custom scripts. While a lot of fun, we came to the conclusion that doing the things we envisaged in the 4DUniverse wasn’t any more compelling than doing them in the 2D interfaces of the time. VRML eventually went away, and probably because no one else was able to make a compelling case for its use. At least I crafted a neat animated GIF logo rendered with POV-Ray.

4 capital D's with a metallic finish joined in a flower shape that is rotating

Less multiverse, and more quantum realm, I also generated VRML content at nanoscale with…

NanoCAD (2001)

NanoCAD was a neat little (pun intended) CAD application for designing molecules, which I extended with a richer editing UI, supporting the design of much more complex hypothesised molecular mechanisms.

NanoCAD UI showing 3D view of construction of a large buckytube from graphene sheets. Application controls and help text are shown in the lower panel.

The Java app allowed users to place atoms in 3D and connect them with covalent bonds. Then an energy solver would attempt to find a stable configuration for the molecules (using classical rather than quantum methods). With expressive selection, duplication and transformation mechanics, it was possible to create benzene rings, stitch them into graphene sheets, and roll them up into “enormous” buckytubes, or other complex carbon creations.

NanoCAD UI showing detailed 3D view of construction of a large buckytube from graphene sheets. Application controls and help text are shown in the lower panel. Further desktop background including MS Word, Windows Explorer and WinAmp evoke the 2001 era

I also created cables housed inside sheaths, gears – built with benzene ring teeth attached to buckytubes – and other micro devices. If 4DUniverse was inspired by Snow Crash, NanoCAD was inspired by The Diamond Age. Nanocad could run in a browser as an applet and the molecules could also be exported as WRL files for display in other viewers.

3D visualisation in the space-filling style of a molecular gear made from benzene ring teeth attached to the outside of a buckytube

Comparing contemporary professional projects

It’s nice to contrast the impermanence of these personal projects with the durability of my contemporary professional work with ANCA Machines. At the time, I was documenting the maths and code of Cimulator3D and also developing the maths and bidirectional UI design for iFlute, both used in the design and manufacturing of machine tools via grinding processes. Both products are still on the market in similar form today, more than two decades later.

I wonder how I’ll view this post in another twenty years?

Synthesising Semantle Solvers

Picking up threads from previous posts on solving Semantle word puzzles with machine learning, we’re ready to explore how different solvers might play along with people while playing the game online. Maybe you’d like to play speed Semantle against an artificially intelligent opponent, maybe you’d like a left-of-field hint on a tricky puzzle, or maybe it’s just fun to spectate at a cerebral robot battle.

Animation of a Semantle game from initial guess to completion

Substitute semantics

The solvers have a view of how words relate due to a similarity model that is encapsulated for ease of change. To date, we’ve used the same model as live Semantle, which is word2vec. But as this might be considered cheating, we can now also use a model based on the Universal Sentence Encoder (USE), to explore how the solvers perform with separated semantics.

Solver spec

To recap, the key elements of the solver ecosystem are now:

  • SimilarityModel – choice of word2vec or USE as above,
  • Solver methods (common to both gradient and cohort variants):
    • make_guess() – return a guess that is based on the solver’s current state, but don’t change the solver’s state,
    • merge_guess(guess, score) – update the solver’s state with information about a guess and a score,
  • Scoring of guesses by either the simulator or a Semantle game, where a game could also include guesses from other players.
Diagram illustrating elements of the solver ecosystem. Similarity model initialises solver state used to make guesses, which are scored by game and update solver state with scores. Other players can make guesses which also get scored

It’s a simplified reinforcement learning setup. Different combinations of these elements allow us to explore different scenarios.

Solver suggestions

Let’s look at how solvers might play with people. The base scenario friends is the actual history of a game played with people, completed in 109 guesses.

Word2Vec similarity

Solvers could complete a puzzle from an initial sequence of guesses from friends. Both solvers in this particular configuration generally easily better the friends result when primed with the first 10 friend guesses.

Line chart comparing three irregular but increasing lines that represent the sequence of scores for guesses in a semantle game. The three lines are labelled friends, cohort, and gradient. Cohort finishes with fewest guesses, then gradient, then friends, with clear separation.

Solvers could instead make the next guess only, but based on the game history up to that point. Both solvers may permit a finish in slightly fewer guesses. The conclusion is that these solvers are good for hints, especially if they are followed!

Line chart comparing three irregular but increasing lines that represent the sequence of scores for guesses in a semantle game. The three lines are labelled friends, cohort, and gradient. Cohort finishes with fewest guesses, then gradient, then friends, with marginal differences.

Maybe these solvers using word2vec similarity do have an unfair advantage though – how do they perform with a different similarity model? Using USE instead, I expected the cohort solver to be more robust than the gradient solver…

USE similarity

… but it seems that the gradient descent solver is more robust to a disparate similarity model, as one example of the completion scenario shows.

Line chart comparing three irregular but increasing lines that represent the sequence of scores for guesses in a semantle game. The three lines are labelled friends, cohort, and gradient. Gradient finishes with fewest guesses, then friends, then cohort, and the separation is clear.

The gradient solver also generally offers some benefit making a suggestion for just the next guess, but the cohort solver’s contribution is marginal at best.

Line chart comparing three irregular but increasing lines that represent the sequence of scores for guesses in a semantle game. The three lines are labelled friends, cohort, and gradient. Gradient finishes with fewest guesses, then friends, and cohort doesn't finish, but the differences are very minor.

These are of course only single instances of each scenario, and there is significant variation between runs. It’s been interesting to see this play out interactively, but a more comprehensive performance characterisation – with plenty of scope for understanding the influence of hyperparameters – may be in order.

Solver solo

The solvers can also play part or whole games solo (or with other players) in a live environment, using Selenium WebDriver to submit guesses and collect scores. The leading animation above is gradient-USE and a below is a faster game using cohort-word2vec.

Animation of a Semantle game from initial guess to completion

So long

And that’s it for now! We have multiple solver configurations that can play online by themselves or with other people. They demonstrate how people and machines can collaborate to each bring their own strengths to solving problems; people with creative strategies and machines with a relentless ability to crunch through possibilities. They don’t spoil the fun of solving Semantle yourself or with friends, but they do provide new ways to play and to gain insight into how to improve your own game.

Postscript: seeing in space

Through all this I’ve considered various 3D visualisations of search through a semantic space with hundreds of dimensions. I’ve settled on the version below, illustrating a search for target “habitat” from first guess “megawatt”.

An animated rotating 3D view of an semi-regular collection of points joined by lines into a sequence. Some points are labelled with words. Represents high-dimensional semantic search in 3D.

This visualisation format uses cylindrical coordinates, broken out in the figure below. The cylinder (x) axis is the projection of each guess to the line that connects the first guess to the target word. The cylindrical radius is the distance of each guess in embedding space from its projection on this line (cosine similarity seemed smoother than Euclidian distance here). The angle of rotation in cylindrical coordinates (theta) is the cumulative angle between the directions connecting guess n-1 to n and n to n+1. The result is an irregular helix expanding then contracting, all while twisting around the axis from first to lass guess.

Three line charts on a row, with common x-axis of guess number, showing semi-regular lines, representing the cylindrical coordinates of the 3D visualisation. The left chart is x-axis, increasing from 0 to 1, middle is radius, from 0 to ~1 and back to 0, and right is angle theta, increasing from 0 to ~11 radians.

Bridging the linguistic inclusion gap with AI

It was great to be able to reflect with colleagues on common themes running through Thoughtworks’ work in languages and technology. In various scenarios, with different technology approaches, we worked to improve the inclusiveness of solutions, pointing to a more linguistically inclusive future.

Rebooting AI Review

I was excited to read Rebooting AI (website), to find inspiration and tools for doing things better. Here is the book in one great quote:

For now, we are in a kind of interregnum: narrow but networked intelligences with autonomy, but too little genuine intelligence to be able to reason about the consequences of that power.

There is a lot to like. Marcus & Davis clearly map out the history and landscape of AI challenges, plus plausible elements of future solutions. They provide useful tools for thinking about problems with partial solutions to intelligence, such as the “fundamental over attribution error” and the “illusory progress gap”. They show how current ML solutions based on big data can be opaque and brittle. They demonstrate how key attributes of human intelligence instead allow the development of rich cognitive models – such as how language and the real world work – and how solutions incorporating such models would address current shortcomings, enabling AI to tackle open-ended tasks. This is great material for a general reader.

Where I felt the book fell short was that it didn’t build many bridges between our current “narrow but networked intelligences” and the authors’ posited future state capabilities. The future state reads like Artificial General Intelligence (AGI) by another name, fleshed out by scenarios that are short on implementation detail. Though sometimes mundane, from our current perspective, Arthur C Clarke might describe as them “indistinguishable from magic” and hence Rodney Brooks would say they are “no longer falsifiable”.  We know there’s a massive chasm between current ML solutions and AGI, but I didn’t find much to close or bridge the chasm in Rebooting AI.

Some of these future capabilities are illustrated by domain-specific modelling techniques – like formal logic – that would be familiar to many computer science students. But I found this a little incongruous because these techniques have also failed to deliver on promises of realising intelligence, and not done any more to squash the “long tail of edge cases” than other narrow intelligences. Given the diverse facets of intelligence, maybe the paradigm of “narrow but networked intelligences” is the best way to achieve or approximate intelligence, or maybe it’s ultimately illusory progress, but these illustrations didn’t help me resolve that.

There is undeniable value in the current generation of ML solutions. How do we build on these? A detailed analysis of a number of key avenues of short to medium term progress was lacking. For instance, starting with current ML solutions, the authors could have explored:

  • various designs of hybrid human-machine decision-making systems that augment human abilities while remaining resilient to new scenarios that stump machines;
  • transfer learning, few-shot learning and sophisticated representation learning like transformers, that have potential to increase the representative and reasoning power of solutions;
  • the role of ecosystem design and governance, including ongoing monitoring and data curation to correct issues (for instance bias testing, CD4ML, etc).

Instead, ML was stereotyped as fully automated, tabula rasa, E2E.

Finally, to know things are getting better, we need the right baseline and measures. While the language examples clearly demonstrated superficial artificial understanding, and self-driving vehicles have a ways to go, some issues raised were not assessed against incumbent human capabilities on narrow tasks in a like-for-like comparison, but rather against posited capabilities of a future AGI system. I would agree that humans can individually reflect and introspect to recognise their mistakes, but it is still the case that, in operational scenarios, humans make mistakes like artificial systems do. These operational mistakes are moderated by the wider ecosystem in which humans operate, in the same way as predictive inference mode is moderated by a wider human-machine ecosystem. I felt the core issue in some instances was structurally unsound or concentrated decision-making without proper governance, rather than whether or not mistakes were made, and this confounded the analysis. I would have liked to have seen these factors teased out so comparisons could be made in a way that would help to measure progress.

Marcus & Davis do lay out a helpful framework for building trust in AI systems, including stress testing, understanding costs of failures, building in modularity and maintainability, etc. This is good guidance but it would be really helpful to see more detail or case studies under these headlines, to the specificity of other works like Weapons of Math Destruction and Made by Humans.

So, maybe I was hoping for “Refactoring AI” rather than “Rebooting AI”. The book certainly clearly describes problems with the current state, and desirable characteristics of the future state. On balance, the technical arguments may indeed be sounder than my concerns. If you’re curious, I would encourage you to read it and draw your own conclusions. Ultimately, however, I’m disappointed because I didn’t leave inspired and equipped with new insight and new tools for improving AI today, tomorrow, and the day after.

No Smooth Path to Good Design

The path to good design is bumpy, as we will demonstrate with four teapots. (Yes, teapots. Teapots are a staple of computer science and philosophy.)

The path to good design matters, because if you are trying to build a design capability, the journey will be smoother if you understand that the path is bumpy.

Leaders who appreciate the bumpy path can facilitate far greater value creation and support a more engaged group of workers.

What is design?

Design is an activity, but also a result: the specification for a product (service), which determines how it is made or delivered.

Performance is a measure of how a product actually functions, for a given task in a given context. Performance in the broadest sense includes emotional responses, static and dynamic physical characteristics, service characteristics, etc. For simplicity, let’s measure performance in monetary terms; eg. lifetime economic value.

Design is important as an activity and a result, because it is the prime determinant of performance that is within your control.

The smooth path

Teapot by Norman
Teapot by Norman[1]
Consider the distinctive teapot from the cover of Don Norman’s Design of Everyday Things, where the handle – instead of opposing – is aligned with the spout.

We know a thing or two about teapots, so we assume this design has very poor performance!

However, we also assume that a traditional design with handle opposed to the spout produces the best performance.

We can plot our smooth model of how performance varies as a function of the angle between spout and handle.

Performance of teapot design variants
Performance of teapot design variants

And it’s pretty clear how to find the best design. The more opposing the handle and spout, the better the performance, the more value created, and hence the better the design.

The first bump in the path

Yokode Kyusu
Yokode Kyusu [1]
However, this model is broken. We can’t interpolate smoothly (linearly) between design points, as demonstrated by the Japanese yokode kyusu, which features a handle at right angles to its spout, to extract every last drop of tea.

With this new insight, and a further assumption that handles in between the points we’ve plotted (eg, 45 degrees) are much worse due to awkward twisting motions when pouring, we can draw a new model, which is already much less smooth.

Teapot performance with new information
Teapot performance with new information

What’s interesting about this landscape is that most design variants perform pretty poorly, and you must be close to a good design to find it. If you didn’t have the insight into teapot performance that we have assumed – if you had only tested performance at the awkward angles, and you had assumed smooth behaviour in between – you would likely miss the best designs and leave significant value on the table. (Note that the scale of this diagram should be greatly exaggerated to demonstrate the true size of value creation opportunities.)

Value created by discovery
Value created by exploration

So, this is the first lesson of the bumpy path to good design. We need to explore the performance of multiple design variants, and understand that small changes in design can have enormous impacts on performance, to be confident we are approaching our potential to create value.

Teapot with handle on top
Teapot with handle on top [3]
So far, we have only explored the impact of one design variable, but for any product we have effectively infinitely many design variables (if we can just conceive them). For instance, the handle of a teapot could also be on top, but we could also consider the shape, material, fixtures, etc. Then we could move beyond the handle to the design of the rest of the teapot!

Now consider the design and delivery of digital products and services. Constraints do exist, but infinite design variants still exist within those constraints. Further, like the rolled up dimensions of string theory, there are extra dimensions of design that are easy to miss, but once discovered can be expanded and explored to create ever more value.

The first lesson

How do leaders get this wrong? By failing to encourage the exploration of a sufficient number of design variants, and by failing to encourage the exploration of minor changes that have outsize impact.

As a leader, you must be prepared to carve out time and space, embrace uncertainty and ambiguity, and bring creativity, compassion and patience to the exploration process. As important as this is to creating value, it is also key to maintaining the engagement of teams involved in or interacting with design.

I’m often told that exploration feels inefficient. Or, rather, felt inefficient. The distinction is importation. Hindsight bias distorts the reality that before starting an exploration into a sufficiently bumpy landscape, we simply cannot know what we will find. So how do we measure efficiency of exploration? Certainly not by how quickly we arrive at a design, or by how many designs are discarded. Should we even measure efficiency of exploration? That is a better question. We should focus on net value creation, and do enough exploration to mitigate the risk that we are leaving significant value on the table.

This design sensibility, however, may not be apparent to the whole team. Designers will be frustrated being managed to a smooth path, while others who perceive the challenge to be simple may become frustrated when the bumpiness is allowed to surface. The team’s various activities may have different cadences that sometimes align, and sometimes don’t. This can create friction and dissatisfaction in teams. Some functional conflict is healthy in this regard, but as a leader, you must support and enable a team to focus on what it takes to create value.

The second bump in the path

I have used word “assume” liberally and deliberately above. I have assumed a large number of things about the tasks that users of the teapots are seeking to achieve, and the broader contexts of use. I have further assumed that my readers share a traditional western notion of teapots and their use. I have done this to keep simple – I hope – the explanation of the first bump.

But “assume” is at the root of the second bump. During product development, we can’t assume performance, we must test designs with users engaged a task in a context. We may take shortcuts by prototyping, simulating, etc, but we must test as objectively as possible, for a meaningful prediction of a product’s performance, and potential to create value.

In a bumpy design landscape, poor predictions of actual performance carry significant opportunity cost.

Value created by testing
Value created by testing

(Note also that during the development of a typical digital product/service, we are typically iteratively discovering the task and the context in parallel.)

We assumed, with our teapots above, that a spout aligned with the handle would lead to poor performance, but we didn’t test it (with a minor tweak in a hidden dimension). If we’d tested this traditional oriental design (as UX Designer Mike Eng did), we would have discovered that, for the task of serving oneself, in a solitary context, the aligned handle actually produces superior performance.

Aligned handle teapot
Aligned handle teapot [4]
I was surprised to find this teapot design existed when I stumbled upon the post from above. I suspect this teapot design has a specific name or an interesting story behind it, but I haven’t been able to track it down. However, it serves as an excellent demonstration that the best design paths are bumpy.

The second lesson

The second lesson is that assumptions about performance, task and context hide the inherent bumpiness in design. As a leader, you must recognise and challenge assumptions, encourage the testing of designs under the correct conditions, and appreciate that our understanding of task and context may evolve with testing.

There are many resources that discuss lightweight and effective approaches to UX research and testing; you could do worse than to start here.


We have discussed two major value creation activities in design:

  • Exploration and consequent discovery of performant designs
  • Testing and consequent selection of more performant designs

But these activities are overlooked or de-prioritised with a smooth mindset. While there is uncertainty, ambiguity and friction along the path, and sometimes progress is difficult to discern, as a leader, you must embrace the bumps because – if you are in the business of creating value – there is no smooth path to good design.

Image credits
  4. Ebay listing from seller

Jetty to Jetty app

I released an app 🙂 – for iOS and Android.

It’s a self-guided audio tour of historic sites in Broome, Western Australia, including beautiful stories told by locals. Nyamba Buru Yawuru developed the concept, curated the media, engaged local stakeholders, and were product owners for the app.

Jetty to Jetty screenshots
Jetty to Jetty screenshots

This work was exciting for its value to the Broome and Yawuru community, but also because it was an opportunity to innovate under the constraint of building the simplest thing possible. The simplest thing possible was in stark contrast to the technical whizbangery (though lean delivery) of my previous app project – Fireballs in the Sky.

I had fun working on the interaction and visual design challenges under the constraints, and I think the key successes were:

  • Simplifying presentation of the real-world and in-app navigation as a hand-rolled map (drawn in Inkscape), showing all the sites, that scrolls in a single direction.
  • Hiding everything unnecessary during playback of stories, to allow the user to focus on the place and the story.
  • Playback control behaviour across sites and the main map.
  • Not succumbing to the temptation to add geo-location, background audio, or anything else that could have added to the complexity!

My colleague Nathan Jones laid the technical foundations – Phonegap/Cordova wrapping a static site built by Middleman and using CoffeeScript, knockout.js, HAML, Sass and HTML5/Cordova plugin for media. He later went on to extend and open-source (as Jila) this framework for the Yawuru Ngan-ga language app. Most of the development work by Nathan and me was done in early 2014.

While intended to be used in Broome (and yet another reason to visit Broome), the app and its beautiful stories can be enjoyed anywhere.

Leave Product Development to the Dummies

This is the talk I gave at Agile Australia 2013 about the role of simulation in product development. Check out a PDF of the slides with brief notes.


"Dummies" talk at Agile Australia

Stop testing on humans! Auto manufacturers have greatly reduced the harm once caused by inadvertently crash-testing production cars with real people. Now, simulation ensures every new car endures thousands of virtual crashes before even a dummy sets foot inside. Can we do the same for software product delivery?

Simulation can deliver faster feedback than real-world trials, for less cost. Simulation supports agility, improves quality and shortens development cycles. Designers and manufacturers of physical products found this out a long time ago. By contrast, in Agile software development, we aim to ship small increments of real software to real people and use their feedback to guide product development. But what if that’s not possible? (And can we still benefit from simulation even when it is?)

The goal of trials remains the same: get a good product to market as quickly as possible (or pivot or kill a bad product as quickly as possible). However, if you have to wait for access to human subjects or real software, or if it’s too costly to scale to the breadth and depth of real-world trials required to optimise design and minimise risk, consider simulation.

Learn why simulation was chosen for the design of call centre services (and compare this with crash testing cars), how a simulator was developed, and what benefits the approach brought. You’ll leave equipped to decide whether simulation is appropriate for your next innovation project, and with some resources to get you started.


  • How and when to use simulation to improve agility
  • The anatomy of a simulator
  • A lean, risk-based approach to developing and validating a simulator
  • Techniques for effectively visualising and communicating simulations
  • Implementing simulated designs in the real world