More Sankey for Less Confusion?

Confusion Matrixes are essential for evaluating classifiers, but for many who are new to them, they cause, well, confusion.

Sankey Diagrams are an alternative way of representing matrix data, and I’ve found some people – who are new to matrix data, like business domain experts who are not experienced data scientists – find them easier to understand. Also, some machine learning researchers find Sankey diagrams useful for analysing data and classifiers.

So, I have posted simple code for visualising classifier evaluation or comparisons as Sankey diagrams. Maybe it will be useful for others, as well as fun for me.

The code combines large portions of Plotly Sankey Diagrams with essence of scikit-learn confusion matrix and a lashings of list comprehension code golf.

The scenarios supported are:

Evaluating a binary classifier against ground truth or as champion-challenger,
Evaluating a multi-class classifier against ground truth or as champion-challenger,
Comparing multiple stages of a decision process, or multiple versions of a binary classifier, for instance over time, or hyper-parameter sweeps, and
Comparing multiple versions of a multi-class classifier.

Example confusion matrixes as Sankey diagrams

For some scenarios where confusion matrixes and their derived measures come in handy, see Cost Sensitive Learning – A Hitchhikers Guide.

See the code on Github.

Posted

October 24, 2020

Article, Machine Learning, Maths, Visualisation

safety