ML Interpretability with Ambient Visualisations

I produced some ambient visualisations as background to short talks on the topic of Interpreting the Black Box of ML from ThoughtWorks Technology Radar Volume 21. The talks were presented in breaks at the YOW Developer Conference.

Animation of linear to non-linear model selection

Here are my speaker notes.

Theme Intro

The theme I’m talking about is Interpreting the Black Box of ML.

It’s a theme because the radar has a lot of ML blips – those are the individual tools, techniques, languages and frameworks we track, and they all have an aspect of interpretability.

I’m going to talk first about Explainability as a First Class Model Concern.

Explainability as a First Class Model Concern

ML models make predictions. They take some inputs and predict an output, based on the data they’ve been trained on. Without careful thought, those predictions can be black boxes

For example – predicting whether someone should be offered credit. A few people at the booth have mentioned this experience[the] black box algorithm thinks I deserve 20x the credit limit [my wife] does – and the difficulty in getting an explanation from the provider [this was a relevant example at the time].

Elevated to a first class concern, however, ML predictions are interpretable and explainable to different degrees – it’s not actually a question of black box or white box, but many shades of grey.

Spectrum

Interpretable means people can reason about a model’s decision-making process in general terms while, explainable means people can understand the factors that led to a specific decision. People are important in this definition – a data scientist may be satisfied with the explanation that the model minimises total loss, while a declined credit applicant probably requires and deserves a reason code. 

And those two extremes can anchor our spectrum – at one end we can explain a result as a general consequence of ML, at the other end explaining the specific factors that contributed to an individual decision.

Dimensions – What

As dimensions of explainability , we should consider:

  • The choice of modelling technique as intrinsically explainable
  • Model agnostic explainability techniques
  • Whether global or just local interpretability is required

Considering model selection – a decision tree is intrinsically explainable – factors contribute sequentially to a decision. A generic deep neural network is not. However, in between, we can architect networks to use techniques such as embeddings, latent spaces or transfer learning, which create representations of inputs that are distinct and interpretable to a degree, but not always in human terms.

And so model specific explainability relies on the modelling technique, while model agnostic techniques are instead empirically applicable to any model. We can create surrogate explainable models for any given model, such as a wide network paired with a deep network, and we can use ablation to explore the effect of changing inputs on a model’s decisions.

For a given decision, we might only wish to understand how that decision would have been different had the inputs changed slightly. In this case we are only concerned about local interpretability and explainability, but not the model as a whole, and LIME is an effective technique.

Reasons – Why

As broader business concerns, we should care about explainability because:

  • Knowledge management is crucial for organisations – an interpretable model, such as the Glasgow Coma Scale, may be valued more for people’s ability to use it than its pure predictive performance
  • We must be compliant to local laws, and it is in all stakeholders’s interests that we act ethically
  • And finally, models can always make mistakes, so a challenge process must be considered, especially as vulnerable people are disproportionately subject to automated decision making

And explainability is closely linked to ethics, and hence the rise of ethical bias testing.

Ethical Bias Testing

Powerful, but Concerning

There is rising concern that powerful ML models could cause unintentional harm. For example, a model could be trained to make profitable credit decisions by simply excluding disadvantaged applicants. So we’re seeing a growing interest in ethical bias testing that will help to uncover potentially harmful decisions, and we expect this field to evolve over time.

Measures

There are many statistical measures we can use to detect unfairness in models. These measures compare outcomes for privileged and unprivileged groups under the model. If we find a model is discriminating against an unprivileged group, we can apply various mitigations to reduce the inequality.  

  • Equal Opportunity Difference is the difference in true positive rates between an unprivileged group and a privileged group. A value close to zero is good.
  • The Disparate Impact is the ratio of the selection rate between the two groups.  The selection rate is the number of individuals selected for the positive outcome divided by the total number of individuals in the group. The ideal value for this metric is 1.

These are just two examples of more than 70 different metrics for measuring ethical bias. Choosing what measure or measures to use is an ethical decision itself, and is affected by your goals. For example, there is the choice between optimising for similarity of outcomes across groups or trying to optimise so that similar individuals are treated the same. If individuals from different groups differ in their non-protected attributes, these could be competing goals.

Correction

To correct for ethical bias or unfairness, mitigations can be applied to the data, to the process of generating the model, and to the output of the model.

  • Data can be reweighted to increase fairness, before running the model.
  • While the model is being generated, it can be penalised for ethical bias or unfairness.
  • Or, after the model is generated, it’s output can be post-processed to remove bias. 

As for explainability, the process of removing ethical bias or improving fairness will likely reduce the predictive performance or accuracy of a model, however, we can see that there is a continuum of tradeoffs possible.

What-if Tool

What is What if

I mentioned tooling is being developed to help with explainability and ethical bias testing, and you should familiarise yourself with these tools and the techniques they use. One example is the What if Tool – an interactive visual interface designed to help you dig into a model’s behaviour. It helps data scientists understand more about the predictions their model is making and was launched by the Google PAIR lab.

Features

You can do things like:

  •  Compare models to each other
  •  Visualize feature importance
  •  Arrange datapoints by similarity
  •  Test algorithmic fairness constraints

Risk

But by themselves tools like this won’t give you explainability or fairness, and using them naively won’t remove the risk or minimize the damage done by a misapplied or poorly trained algorithm. They should be used by people who understand the theory and implications of the results. However, they can be powerful tools to help communicate, tell a story, make the specialised analysis more accessible, and hence motivate improved practice and outcomes.

CD4ML

The radar also mentions for the second time CD4ML – using Continuous Delivery practices for delivering ML solutions. CD in general encourages solutions to evolve in small steps, and the same is true for ML solutions. The benefit of this is that we can more accurately identify the reasons for any change in system behaviour if they are the result of small changes in design or data. So we also highlight CD4ML as a technique for addressing explainability and ethical bias

Cost Sensitive Learning – A Hitchhikers Guide

Typically prediction is about getting the right answer. But many prediction problems have large and asymmetric costs for different types of mistakes. And often, the chance of making mistakes is exacerbated by training data imbalances. Cost-Sensitive Learning is the range of techniques for extending standard ML approaches to deal with imbalanced data and outcomes. Cost-sensitive predictions will instead favour the most valuable or lowest risk answers.

I presented Cost Sensitive Learning – A Hitchhikers Guide at the Melbourne ML/AI Meetup.

Reasoning About Machine Intuition

This talk discusses the resurgence of Machine Learning and neural networks from multiple perspectives of digital delivery, including: product & design, iterative implementation, organisational design, governance and risk. I chose to use “Intuition” to distinguish ML’s capability for pattern recognition from other descriptions of intelligence. Slides here.

Scaling Change

Once upon a time, scaling production may have been enough to be competitive. Now, the most competitive organisations scale change to continually improve customer experience. How can we use what we’ve learned scaling production to scale change?

Metaphors for scaling
Metaphors for scaling

I recently presented a talk titled “Scaling Change”. In the talk I explore the connections between scaling production, sustaining software development, and scaling change, using metaphors, maths and management heuristics. The same model of change applies from organisational, marketing, design and technology perspectives.  How can factories, home loans and nightclubs help us to think about and manage change at scale?

Read on with the spoiler post if you’d rather get right to the heart of the talk.

Arguments with Agency

Here are slides from my talk at LASTconf 2015. The title is “Bring Your A-Game to Arguments for Change”. The premise is that there are different types of arguments, more or less suited to various organisational and delivery scenarios, and the best ones have their own agency. In these respects, you can think of them like Pokemon – able to go out and do your bidding, with the right preparation.

Change agents
Change agents

The content draws heavily from ideas shared on this blog:

Seeing Stars – Bespoke AR for Mobiles

I presented on the development of the awesome Fireballs in the Sky app (iOS and Android) at YOW! West with some great app developers. See the PDF. (NB. there were a lot of transitions)

Abstract

We’ll explore the development of the Fireballs in the Sky app, designed for citizen scientists to record sightings of meteorites (“fireballs”) in the night sky. We’ll introduce the maths for AR on a mobile device, using the various sensors, and we’ll throw in some celestial mechanics for good measure.

We’ll discuss the prototyping approach in Processing. We’ll describe the iOS implementation, including: libraries, performance tuning, and testing. We’ll then do the same for the Android implementation. Or maybe the other way around…

Playing Games is Serious Business

Simple game scenarios can produce the same outcomes as complex and large-scale business scenarios. Serious business games can therefore reduce risk and improve outcomes when launching new services. Gamification also improves alignment and engagement across organisational functions.

This is a presentation on using games to understand and improve organisational design and service delivery, which I presented at the Curtin University Festival of Teaching and Learning.

(Don’t be concerned by what looks like a bomb-disposal robot in the background.)

The slides provide guidance on applying serious business games in your context.

Data Visualisation: Good for Business

It was great to be part of the recent ThoughtWorks data visualisation event in Perth. There’s a summary on the ThoughtWorks Insights channel.

Visualisation is a topic I love talking about – especially demonstrating why it’s good for business – and presenting with Ray Grasso was a lot of fun.

Here’s the full video of the presentation.

If you want to pick and choose:

  • I start with the historical perspective and current state
  • 5.40, Ray starts the IMO story
  • 28.55, I start the call centre story
  • 41.53, Ray starts the NOPSEMA story
  • 54.39, We take questions

I’ve been talking to people about the event, and they always say something like:

“I’m such a visual person. I love it when people explain things to me visually.”

No-one ever says:

“Don’t show me a picture.”

Words are important, of course, as are other means of communicating. We all have multiple ways of processing information. However, visual processing is almost always a key component. Consider my friend the lawyer, who remembered cases because her lecturer pinned them on a map and illustrated them with holiday snap shots. I’m sure you have a similar example.

So we “see” that data visualisation is good for humans. And what’s good for humans is good for business. Key business outcomes include engaging communications, operational clarity, and unexpected insights.

Enough words. Browse the slides below or watch the presentation above.

Thanks to Diana Adorno for the feature pic.

Leave Product Development to the Dummies

This is the talk I gave at Agile Australia 2013 about the role of simulation in product development. Check out a PDF of the slides with brief notes.

Description

"Dummies" talk at Agile Australia

Stop testing on humans! Auto manufacturers have greatly reduced the harm once caused by inadvertently crash-testing production cars with real people. Now, simulation ensures every new car endures thousands of virtual crashes before even a dummy sets foot inside. Can we do the same for software product delivery?

Simulation can deliver faster feedback than real-world trials, for less cost. Simulation supports agility, improves quality and shortens development cycles. Designers and manufacturers of physical products found this out a long time ago. By contrast, in Agile software development, we aim to ship small increments of real software to real people and use their feedback to guide product development. But what if that’s not possible? (And can we still benefit from simulation even when it is?)

The goal of trials remains the same: get a good product to market as quickly as possible (or pivot or kill a bad product as quickly as possible). However, if you have to wait for access to human subjects or real software, or if it’s too costly to scale to the breadth and depth of real-world trials required to optimise design and minimise risk, consider simulation.

Learn why simulation was chosen for the design of call centre services (and compare this with crash testing cars), how a simulator was developed, and what benefits the approach brought. You’ll leave equipped to decide whether simulation is appropriate for your next innovation project, and with some resources to get you started.

Discover:

  • How and when to use simulation to improve agility
  • The anatomy of a simulator
  • A lean, risk-based approach to developing and validating a simulator
  • Techniques for effectively visualising and communicating simulations
  • Implementing simulated designs in the real world