ML Interpretability with Ambient Visualisations

I produced some ambient visualisations as background to short talks on the topic of Interpreting the Opaque Box of ML from ThoughtWorks Technology Radar Volume 21. The talks were presented in breaks at the YOW Developer Conference.

Animation of linear to non-linear model selection

Here are my speaker notes.

Theme Intro

The theme I’m talking about is Interpreting the Opaque Box of ML.

It’s a theme because the radar has a lot of ML blips – those are the individual tools, techniques, languages and frameworks we track, and they all have an aspect of interpretability.

I’m going to talk first about Explainability as a First Class Model Concern.

Explainability as a First Class Model Concern

ML models make predictions. They take some inputs and predict an output, based on the data they’ve been trained on. Without careful thought, those predictions can be opaque boxes

For example – predicting whether someone should be offered credit. A few people at the booth have mentioned this experience“[the] black box algorithm thinks I deserve 20x the credit limit [my wife] does” – and the difficulty in getting an explanation from the provider [this was a relevant example at the time].

Elevated to a first class concern, however, ML predictions are interpretable and explainable to different degrees – it’s not actually a question of opaque box or transparent box, but many shades of translucency.

Spectrum

Interpretable means people can reason about a model’s decision-making process in general terms while, explainable means people can understand the factors that led to a specific decision. People are important in this definition – a data scientist may be satisfied with the explanation that the model minimises total loss, while a declined credit applicant probably requires and deserves a reason code. 

And those two extremes can anchor our spectrum – at one end we can explain a result as a general consequence of ML, at the other end explaining the specific factors that contributed to an individual decision.

Dimensions – What

As dimensions of explainability , we should consider:

  • The choice of modelling technique as intrinsically explainable
  • Model agnostic explainability techniques
  • Whether global or just local interpretability is required

Considering model selection – a decision tree is intrinsically explainable – factors contribute sequentially to a decision. A generic deep neural network is not. However, in between, we can architect networks to use techniques such as embeddings, latent spaces or transfer learning, which create representations of inputs that are distinct and interpretable to a degree, but not always in human terms.

And so model specific explainability relies on the modelling technique, while model agnostic techniques are instead empirically applicable to any model. We can create surrogate explainable models for any given model, such as a wide network paired with a deep network, and we can use ablation to explore the effect of changing inputs on a model’s decisions.

For a given decision, we might only wish to understand how that decision would have been different had the inputs changed slightly. In this case we are only concerned about local interpretability and explainability, but not the model as a whole, and LIME is an effective technique.

Reasons – Why

As broader business concerns, we should care about explainability because:

  • Knowledge management is crucial for organisations – an interpretable model, such as the Glasgow Coma Scale, may be valued more for people’s ability to use it than its pure predictive performance
  • We must be compliant to local laws, and it is in all stakeholders’s interests that we act ethically
  • And finally, models can always make mistakes, so a challenge process must be considered, especially as vulnerable people are disproportionately subject to automated decision making

And explainability is closely linked to ethics, and hence the rise of ethical bias testing.

Ethical Bias Testing

Powerful, but Concerning

There is rising concern that powerful ML models could cause unintentional harm. For example, a model could be trained to make profitable credit decisions by simply excluding disadvantaged applicants. So we’re seeing a growing interest in ethical bias testing that will help to uncover potentially harmful decisions, and we expect this field to evolve over time.

Measures

There are many statistical measures we can use to detect unfairness in models. These measures compare outcomes for privileged and unprivileged groups under the model. If we find a model is discriminating against an unprivileged group, we can apply various mitigations to reduce the inequality.  

  • Equal Opportunity Difference is the difference in true positive rates between an unprivileged group and a privileged group. A value close to zero is good.
  • The Disparate Impact is the ratio of the selection rate between the two groups.  The selection rate is the number of individuals selected for the positive outcome divided by the total number of individuals in the group. The ideal value for this metric is 1.

These are just two examples of more than 70 different metrics for measuring ethical bias. Choosing what measure or measures to use is an ethical decision itself, and is affected by your goals. For example, there is the choice between optimising for similarity of outcomes across groups or trying to optimise so that similar individuals are treated the same. If individuals from different groups differ in their non-protected attributes, these could be competing goals.

Correction

To correct for ethical bias or unfairness, mitigations can be applied to the data, to the process of generating the model, and to the output of the model.

  • Data can be reweighted to increase fairness, before running the model.
  • While the model is being generated, it can be penalised for ethical bias or unfairness.
  • Or, after the model is generated, it’s output can be post-processed to remove bias. 

As for explainability, the process of removing ethical bias or improving fairness will likely reduce the predictive performance or accuracy of a model, however, we can see that there is a continuum of tradeoffs possible.

What-if Tool

What is What if

I mentioned tooling is being developed to help with explainability and ethical bias testing, and you should familiarise yourself with these tools and the techniques they use. One example is the What if Tool – an interactive visual interface designed to help you dig into a model’s behaviour. It helps data scientists understand more about the predictions their model is making and was launched by the Google PAIR lab.

Features

You can do things like:

  •  Compare models to each other
  •  Visualize feature importance
  •  Arrange datapoints by similarity
  •  Test algorithmic fairness constraints

Risk

But by themselves tools like this won’t give you explainability or fairness, and using them naively won’t remove the risk or minimize the damage done by a misapplied or poorly trained algorithm. They should be used by people who understand the theory and implications of the results. However, they can be powerful tools to help communicate, tell a story, make the specialised analysis more accessible, and hence motivate improved practice and outcomes.

CD4ML

The radar also mentions for the second time CD4ML – using Continuous Delivery practices for delivering ML solutions. CD in general encourages solutions to evolve in small steps, and the same is true for ML solutions. The benefit of this is that we can more accurately identify the reasons for any change in system behaviour if they are the result of small changes in design or data. So we also highlight CD4ML as a technique for addressing explainability and ethical bias

Scaling Change

Once upon a time, scaling production may have been enough to be competitive. Now, the most competitive organisations scale change to continually improve customer experience. How can we use what we’ve learned scaling production to scale change?

Metaphors for scaling
Metaphors for scaling

I recently presented a talk titled “Scaling Change”. In the talk I explore the connections between scaling production, sustaining software development, and scaling change, using metaphors, maths and management heuristics. The same model of change applies from organisational, marketing, design and technology perspectives.  How can factories, home loans and nightclubs help us to think about and manage change at scale?

Read on with the spoiler post if you’d rather get right to the heart of the talk.

Scaling Change Spoiler

When software engineers think about scaling, they think in terms of the order of complexity, or “Big-O“, of a process or system. Whereas production is O(N) and can be scaled by shifting variable costs to fixed, I contend that change is O(N2) due to the interaction of each new change with all previous changes. We could visualise this as a triangular matrix heat map of the interaction cost of each pair of changes (where darker shading is higher cost).

Change heatmap
Change interaction heatmap

The thing about change being O(N2) is that the old production management heuristics of shifting variable cost to fixed no longer work, because the dominant mode is interaction cost. Instead we use the following management heuristics:

Socialise

Socialising change
Socialising change

We take a variable cost hit for each change to help it play more nicely with every other change. This reduces the cost coefficient but not the number of interactions (N2).

Screen

Screening change
Screening change

We only take in the most valuable changes. Screening half our changes (N/2) reduces change interactions by three quarters (N2/4).

Seclude

Secluding change
Secluding change

We arrange changes into separate spaces and prevent interaction between spaces. Using n spaces reduces the interactions to N2/n.

Surrender

Surrendering change
Surrendering change

Like screening, but at the other end. We actively manage out changes to reduce interactions. Surrendering half our changes (N/2) reduces change interactions by three quarters (N2/4).

Scenarios

Where do we see these approaches being used? Just some examples:

  • Start-ups screen or surrender changes and hence are more agile than incumbents because they have less history of change.
  • Product managers screen changes in design and seclude changes across a portfolio, for example the separate apps of Facebook/ Messenger/ Instagram/ Hyperlapse/ Layout/ Boomerang/ etc
  • To manage technical debt, good developers socialise via refactoring, better seclude through architecture, and the best surrender
  • In hiring, candidates are screened and socialised through rigorous recruitment and training processes
  • Brand architectures also seclude changes – Unilever’s Dove can campaign for real beauty while Axe/Lynx offends Dove’s targets (and many others).

See Also

The life-changing magic of tidying your work

Surprise! Managing work in a large organisation is a lot like keeping your belongings in check at home.

Get it wrong at home and you have mess and clutter. Get it wrong in the organisation and you have excessive work in progress (WIP), retarding responsiveness, pulverising productivity, and eroding engagement.

Reading Marie Kondo’s The Life-Changing Magic of Tidying Up (Amazon), I was struck by a number of observations about tidying personal belongings that resonated with how individuals, teams and organisations manage their work.

First, reading TLCMOTU helped me tidy my things better. Second, it reinforced lean and agile management principles.

I won’t review the book here. Maybe the methods and ideas resonate with you, maybe they don’t. However, because I think tidying is something that everyone can relate to, I will compare some of KonMari’s (as Marie Kondo is known) explanations of the management of personal belongings with the management of work in organisations. The translation heuristic is to replace stuff with work, and clutter with excessive WIP, to highlight the parallels.

I’d love to know if you find the comparison useful.

On the complexity of work storage systems

KonMari writes:

Most people realise that clutter is caused by too much stuff. But why do we have too much stuff? Usually it is because we do not accurately grasp how much we actually own. And we fail to grasp how much we own because our storage methods are too complex.

Organisations typically employ complex storage methods for their work: portfolio and project management systems with myriad arcane properties, intricate plans, baselines and revisions, budget and planning cycle constraints, capitalisation constraints, fractional resource allocations, and restricted access to specialists who are removed from the outcomes but embrace the management complexity.

And this is just the work that’s stored where it should be. Then there’s all the work that’s squirrelled away into nooks and crannies that has to be teased out by thorough investigation (see below).

Because organisations don’t comprehend the extent of their work, they invent ever-more complex systems to stuff work into storage maximise utilisation of capacity, which continues to hide the extent of the work.

Thus, we fail to grasp how much work is held in the organisation, and the result is excessive WIP, which inflates lead times and reduces productivity, failing customers and leaving workers disengaged. Simplifying the storage of work – as simple as cards on a wall, with the information we actually need to deliver outcomes – allows us to comprehend the work we hold, and allows us to better manage WIP for responsiveness and productivity.

On making things visible

KonMari observes that you cannot accurately assess how much stuff you have without seeing it all in one place. She recommends searching the whole house first, bringing everything to the one location, and spreading the items out on the floor to gain visibility.

Making work visible, in one place, to all stakeholders is a tenet of agile and lean delivery. It reveals amazing insights, many unanticipated, about the volume, variety and value (or lack of) of work in progress. The shared view helps build empathy and collaboration between stakeholders and delivery teams. You may need to search extensively within the organisation to discover all the work, but understanding of the sources of demand (as below) will guide you. A great resource for ideas and examples of approaches is Agile Board Hacks.

So get your work on cards on a wall so you can see the extent of your WIP.

On categories

KonMari observes that items in one category are stored in multiple different places, spread out around the house. Categories she identifies include clothes, books, etc. She contends that it’s not possible to assess what you want to keep and discard without seeing the sum of your belongings in each category. Consequently, she recommends thinking in terms of category, rather than place.

If we think organisationally in terms of place, we think of silos – projects, teams, functions. We can’t use these storage units to properly assess the work we hold in the organisation. Internal silos don’t reflect how we serve customers.

Instead, if we think organisationally in terms of category, we are thinking strategically. With a cascading decomposition of strategy, driven by the customer, we can assess the work in the organisation at every level for strategic alignment (strategy being emergent as well as explicit). Strategy could be enterprise level themes, or the desired customer journey at a product team level.

With work mapped against strategy, we can see in one place the sum of efforts to execute a given branch of strategy, and hence assess what to keep and what to discard. We further can assess whether the entire portfolio of work is sufficiently aligned and diversified to execute strategy.

So use your card wall to identify how work strategically serves your customers.

On joy

KonMari writes:

The best way to choose what to keep and what to throw away is to … ask: ‘Does this spark joy?’ If it does keep it. If not, throw it out.

We may ask of each piece of work: ‘Is this work valuable?’ ‘Is it aligned to the purpose of the organisation?’ ‘Is it something customers want?’ If it is, keep it. If not, throw it out.

KonMari demonstrates why this is effective by taking the process to its logical conclusion. If you’ve discarded everything that doesn’t spark joy, then everything you have, everything you interact with, does spark joy.

What better way to spark joy in your people than to reduce or eliminate work with no value and no purpose?

On discarding first

KonMari observes that storage considerations interrupt the process of discarding. She recommends that discarding comes first, and storage comes second, and the activities remain distinct. If you start to think about where to put something before you have decided whether to keep or discard it, you will stop discarding.

Prioritisation is the act of discarding work we do not intend to pursue. Prioritisation comes first, based purely on value, before implementation considerations. Sequencing can be done with knowledge of effort and other dependencies. Then scheduling, given capacity and other constraints, is the process of deciding which “drawers” to put work in.

On putting things away

KonMari observes that mess and clutter is a result of not putting things away. Consequently she recommends that storage systems should make it easy to put things away, not easy to get them out.

Excessive WIP may also be caused by a failure to rapidly stop work (or perceived inability to do so). Organisational approaches to work should reduce the effort needed to stop work. For instance, with continuous delivery, a product is releasable at all times, and can therefore be stopped after any deployment. Work should be easily stoppable in preference to easily startable. (This could also be framed as “stop starting and start finishing”.)

Further, while many organisations aim for responsiveness with a stoppable workforce (of contractors), they should instead aim for a stoppable portfolio, and workforce responsiveness will follow.

On letting things go

A client of KonMari’s comments:

Up to now, I believed it was important to do things that added to my life …  I realised for the first time that letting go is even more important than adding.

I have written about the importance of letting go of work from the perspective of via negativa management in Dumbbell Delivery; Antifragile Software, and managing socialisation costs in Your Software is a Nightclub.

However, KonMari also observes that, beyond the mechanics of managing stuff (or work), there is a psychological cost of clutter (or excessive WIP). Her clients often report feeling constrained by perceived responsibility to stuff that brings them no joy. I suspect the same is true in the organisation: we fail to recognise and embrace possibilities because we are constrained by perceived responsibilities to work that ultimately has no value.

Imagine if we could throw off those shackles. That’s worth letting a few things go.

Arguments with Agency

Here are slides from my talk at LASTconf 2015. The title is “Bring Your A-Game to Arguments for Change”. The premise is that there are different types of arguments, more or less suited to various organisational and delivery scenarios, and the best ones have their own agency. In these respects, you can think of them like Pokemon – able to go out and do your bidding, with the right preparation.

Change agents
Change agents

The content draws heavily from ideas shared on this blog:

Narrative Visualisation Tools

I use narrative visualisations a lot. I like to frame evidence so that it commands attention, engages playful minds, and tells its own story (see also Corporate Graffiti). I’ll put new tools on GitHub as I create them. Here are three to start.

Visualising Stand-Up Attendance

I used the Space Invader metaphor with a busy leadership team to explain how things would slip through the gaps from day if they didn’t attend stand-up in sufficient numbers and with sufficient regularity. The invaders represent the team members present each day, and each advancing row is a new day. The goal of the game is reversed in this case – we want the invaders to win! The team loved it and loved seeing their improved attendance reflected in a denser mesh of invaders.

Standup Space Invaders
Standup Space Invaders

Source on GitHub.

Aggregating Retrospectives

Useful if you want to aggregate multiple retrospectives – either the same team over time, or multiple teams on a common theme – and present them back while preserving the sincerity of the original outputs.

Re-retro screenshot
Re-retro screenshot

Source on GitHub.

Cycle Times from Trello

Trello is a wonderful tool for introducing visual management. It is not, however, great for reporting. Trycle (source on GitHub) will calculate cycle times for all cards transitioning between two lists using the JSON export of a Trello board (or the dwell time if just one list). Visuals and narrative not included.

Visual Knowledge Cycles

Visualisation is a key tool for the management of knowledge, especially knowledge from data. We’ll explore different states of knowledge, and how we can use visualisation to drive knowledge from one state to another, as individual creators of visualisation and agents within an organisation or society.

Visualisation Cycle
Visualisation Cycle

(There’s some justifiable cynicism about quadrant diagrams with superimposed crap circles. But, give me a chance…)

Awareness and Certainty about Knowledge

We’re used to thinking about knowledge in terms of a single dimension: we know something more or less well. However, we’ll consider two dimensions of knowledge. The first is certainty – how confident are you that what you know is right? (Or wrong?) The second is awareness – are you even conscious of what you know? (Or don’t know?)

These two dimensions define four states of knowledge – a framework you might recognise – from “unknown unknowns” to “known knowns”. Let’s explore how we use visualisation to drive knowledge from one state to another.

Knowledge states
Knowledge states

(Knowledge is often conceived along other dimensions, such as tacit and explicit, due to Nonaka and Takeuchi. I’d like to include a more detailed discussion of this model in future, but for now will note that visualisation is an “internalisation” technique in this model, or an aid to “socialisation”.)

Narrative Visualisation

I think this is the easiest place to start, because narrative visualisation helps us with knowledge we are aware of. Narrative visualisation means using visuals to tell a story with data.

Narrative Visualisation
Narrative Visualisation

We can use narrative visualisation to drive from low certainty to high certainty. We can take a “known unknown”, or a question, and transform it to a “known known”, or an answer.

“Where is time spent in this process?” we might ask. A pie chart provides a simple answer. However, it doesn’t tell much of a story. If we want to engage people in process of gaining certainty, if we want to make the story broader and deeper, we need to visually exploit a narrative thread. Find a story that will appeal to your audience and demonstrate why they should care about this knowledge, then use the narrative to drive the visual display of data. Maybe we emphasise the timeliness by displaying the pie chart on a stopwatch, or maybe we illustrate what is done at each stage to provide clues for improvement. (NB. Always exercise taste and discretion in creating narrative visualisations, or they may be counter-productive.)

Here is a brilliant and often cited narrative visualisation telling a powerful story about drone strikes in Pakistan.

Drones Visualisation
Screen shot of Pitch Interactive Drones Visualisation

The story also provides a sanity check for your analysis – is the story coherent, is it plausible? This helps us to avoid assigning meaning to spurious correlation (eg, ski accidents & bed-sheet strangulation), but do keep an open mind all the same.

Discovery Visualisation

But where do the questions to be answered come from? This is the process of discovery, and we can use visualisation to drive discovery.

Discovery Visualisation
Discovery Visualisation

Discovery can drive from low awareness, low certainty to high awareness, low certainty – from raw data to coherent questions. Discovery is where to start when you have “unknown unknowns”.

But how do you know you have “unknown unknowns”? Well, the short answer is: you do have them – that’s the thing about awareness. However, we’ll explore a longer answer too.

If someone drops a stack of new data in your lap (and I’m not suggesting that is best practice!), it’s pretty clear you need to spend some time discovering it, mapping out the landscape. However, when it’s data in a familiar context, the need for discovery may be less clear – don’t you already know the questions to be answered? We’ll come back to that question later.

A classic example of this kind of discovery can be found at Facebook Engineering, along with a great description of the process.

Facebook friends
Facebook friends visualisation

In discovery visualisation, we let the data lead, we slice and dice many different ways, we play with the presentation, we use data in as raw form as possible. We don’t presuppose any story. On our voyage of discovery, we need to hack through undergrowth to make headway and scale peaks for new vistas, and in that way allow the data to reveal its own story.

Inductive Drift

What if you’ve done your discovery and done your narration? You’re at “known knowns”, what more need you do?

If the world was linear, the answer would be “nothing”. We’d be done (ignoring the question of broader scope). The world is not linear, though. Natural systems have complex interactions and feedback cycles. Human systems, which we typically study, comprise agents with free will, imagination, and influence. What happens is that the real world changes, and we don’t notice.

We don’t notice because our thinking process is inductive. What that means is that our view of the world is based on an extrapolation of a very few observations, often made some time in the past. We also suffer from confirmation bias, which means we tend to downplay or ignore evidence which contradicts our view of the world. This combination makes it very hard to shift our superstitions beliefs. (The western belief that men had one less rib than women persisted until the 16th century CE due to the biblical story of Adam and Eve.)

So where does this leave us? It leaves us with knowledge of which we are certain, but unaware. These are the slippery “unknown knowns”, though I think a better term is biases.

Unlearning Visualisation

Unlearning visualisation is how we dispose of biases and embrace uncertainty once more. This is how we get to a state of “unknown unknowns”.

Unlearning Visualisation
Unlearning Visualisation

However, as above, unlearning is difficult, and may require overwhelming contradictory evidence to cross an “evidentiary threshold”. We must establish a “new normal” with visuals. This should be the primary concern of unlearning visualisation – to make “unknown unknowns” look like an attractive state.

Big data is particularly suited to unlearning, because we can – if we construct our visualisation right – present viewers with an overwhelming number of sample points.

Unlearning requires both data-following and story-telling approaches. If we take away one factually-based story viewers tell themselves about the world, we need to replace it with another.

Recap

Visualisation Cycle
Visualisation Cycle

Your approach to visualisation should be guided by your current state of knowledge:

  • If you don’t know what questions to ask, discovery visualisation will help you find key questions. In this case, you are moving from low awareness to high awareness of questions, from “unknown unknowns” to “known unknowns”.
  • If you are looking to answer questions and communicate effectively, narrative visualisation helps tell a story with data. In this case, you are moving from low certainty to high certainty, from “known unknowns” to “known knowns”.
  • If you have thought for some time that you know what you know and know it well, you may be suffering from inductive drift. In this case, use unlearning visualisation to establish a new phase of inquiry. In this case, you are moving from high certainty and awareness low certainty and awareness, returning to “unknown unknowns”.

Of course, it may be difficult to assess your current state of knowledge! You may have multiple states superimposed. You may only be able to establish where you were in hindsight, which isn’t very useful in the present. However, this framework can help to cut through some of the fog of analysis, providing a common language for productive conversations, and providing motivation to keep driving your visual knowledge cycles.

Corporate Graffiti – Being Disruptive with Visual Thinking

As you go about your work you’ll come up against walls. Some walls will be blank and boring blockers to progress. These need decoration; spraying with layers that confer meaning. So pick a corner and start doodling. With a new perspective, you’ll find a way around the blockers. Other walls will come with messages – by design or default – leading you in a certain direction. If this isn’t where you want to go, you’ll need to plot your own course by subverting or overwhelming the prevailing visuals.

This is your challenge, and your opportunity for innovation: to disrupt the established visual environment with new ways of looking at the world that, in turn, unlock new ways of thinking. If you think you could make your organisation more agile with some disruptive visual thinking, read on for my experience [on the Organisational Agility channel of ThoughtWorks Insights].

Seeing Stars – Bespoke AR for Mobiles

I presented on the development of the awesome Fireballs in the Sky app (iOS and Android) at YOW! West with some great app developers. See the PDF. (NB. there were a lot of transitions)

Abstract

We’ll explore the development of the Fireballs in the Sky app, designed for citizen scientists to record sightings of meteorites (“fireballs”) in the night sky. We’ll introduce the maths for AR on a mobile device, using the various sensors, and we’ll throw in some celestial mechanics for good measure.

We’ll discuss the prototyping approach in Processing. We’ll describe the iOS implementation, including: libraries, performance tuning, and testing. We’ll then do the same for the Android implementation. Or maybe the other way around…