Skip to main content

Optimising an AI Model with the Confusion Matrix

What is Einstein Platform Services?

Einstein Platform Services allows developers to develop AI-powered custom models which can be used to solve an array of real-life use-cases. It empowers developers with a powerful set of Application Programming Interfaces (APIs) for building intelligent and predictive app experiences. Today, Einstein Platform Services include:
  • Einstein Image Classification (Einstein Vision)
  • Einstein Object Detection (Einstein Vision)
  • Einstein Intent (Einstein Language)
  • Einstein Sentiment (Einstein Language)
These services enable developers to create custom deep learning models from unstructured text and image data, which can be trained, refined and used for building intelligent apps.

Create, Train and predict

Using Einstein Platform Services is a simple 3 steps process, you need to create and upload dataset like text and image data, which can be used for training a model. Once the dataset is uploaded it can be used to train the model. This trained model predicts which class a new input falls into based on the predefined classes specified in your training dataset.
Einstein Platform Services flow

Note: If the predictions are not accurate we can use feedback APIs and Model’s confusion matrix to analyze the training status

Before you start, if this is the first time you are going to use and explore Einstein platform APIs, I recommend going through Trailhead modules. Once you gather basic knowledge about Einstein Platform you can jump start with Einstein Vision and Language Model Builder and then learn more complex concepts about Einstein Platform Services.

What is the Confusion Matrix

A confusion matrix is a specific table layout that visualizes the performance of an algorithm. In Salesforce, you can use this grid of values to evaluate the performance of Einstein Vision and Language models.
The statistics derived from the confusion matrix can be used further to analyse, optimize or refactor the dataset. For example, you can use the information to identify whether you need any label correction in the dataset or need to introduce a new label. You will see how it can be done in our example.
Typical Confusion Matrix
The confusion matrix essentially places the resulting predictions into four groups called as TP, TN, FP, FN.
  • True positive (TP) and True negative (TN) where the model predicts image/statement accurately.
  • False positive (FP) and False negative (FN) are Type-I and Type-II errors
Note:In statistical hypothesis testing, a type I error is the rejection of a true null hypothesis, while a type II error is the non-rejection of a false null hypothesis
There are a number of metrics that can be derived from the confusion matrix which measure the performance of the model. Most commonly used metrics are:
  1. Precision
  2. Recall
  3. F1-scores
Note: Details about formulas and calculations can be found here

Accessing Model Metric

Model metrics for a model returns the f1 score, accuracy, and confusion matrix. The combination of these metrics gives you a picture of model accuracy and how well the model will perform.
Accessing Image Model’s Metric

Accessing Language Model’s Metric
Note: This call returns the metrics for the last epoch in the training used to create the model, to see the metrics for each epoch use Learning Curve endpoint (/lc)
The call to get the model metrics returns a response similar to this JSON.

Start building the matrix by writing labels vertically and horizontally. In given example you can see two labels which can be placed as shown below:




Start placing the confusionMatrix contents for each label




In the grid, all the diagonal values are considered to be accurate prediction by model, whereas non-diagonal values represent errors. Here in above example you can see a couple of images labelled as Beaches are identified as Mountains

How metric data is calculated

When model training begins from the given dataset, a portion of the data set is used for actual model training and other part is used to test the model. The ratio of test data versus training data is called the split ratio.
The developer can define this ratio by setting trainSplitRatio parameter of /train endpoint. The default values for the image data is 10% whereas 20% of data is used while testing the text-based models.
Sample train status response with total examples, trainSplitSize and testSpiltSize

Optimising a Model with the Confusion Matrix

So far you know, how to get training status, check and use trainSplitRatio parameter, use of model metrics and build/read confusion matrix.
You have the visual explanation of your model’s accuracy, you can use this information to optimize the future predictions by optimizing dataset and retrain the model. Remember the ideal confusion matrix should have all the numbers placed diagonally.
Identifying Misclassified data is the most important steps in order to optimize your model. This can be done manually by scanning every item(image/text) and it’s label in the dataset (optimal for a small set of data)or by using /predict endpoint for every image/text in the dataset.
  1. Sometimes, the data is added with the wrong label which can be quickly fixed by renaming the label and created new dataset and train new model.
  2. For some confusing entities which might belong to 1 or more labels can be removed or you can introduce new label for such items.
  3. Adding more data to the dataset is always the best approach to increase the training accuracy of the model. You can also change the /trainSplitRatio parameter for better test results.
  4. Using a feedback loop to improve the model’s prediction capabilities by using the model’s misclassified data to train new versions of the model

Additional Resources

Comments

Popular posts from this blog

Salesforce Einstein -Named Entity Recognition

What is Einstein NER(Beta) ? NER, or Named Entity Recognition, is a process of recognizing/extracting/classifying named entities from simple text into meaningful categories like person, organization, email, phone etc. The input to the NER model is unstructured text, and after processing it identifies entities within that text.  Einstein Platform Services Einstein Platform Services brings a wide variety of machine learning capabilities — such as image recognition, sentiment, intent and more — directly into Salesforce’s trusted and secure infrastructure. You can simply deploy these services using a single line of code into any Salesforce page or build a reusable AI-powered Lightning Web Component. Once the details are worked upon, admins can easily deploy to end-users with just a few clicks. An interesting fact about this is that you wouldn't need any model setup, training or data science skills to work on Einstein NER. Named Entity Recognition(Beta) is new addition to Sale...

Feedback Loops: Optimizing Image Classifiers in 3 Steps

Salesforce Einstein Vision gives you a set of powerful API's that can be used to bring image recognition capabilities into your CRM application.  Image recognition can be used for Visual search, Brand detection, Product identification and many more. With Salesforce Einstein Vision, you can leverage pre-trained image classifiers, or train your own custom classifiers to solve a vast array of image-recognition use cases, empowering end users across sales, service, and marketing to be smarter and more predictive. What is Feedback loop? A feedback loop refers to the process by which an AI model’s predicted outputs are reused to train new versions of the model. Salesforce Einstein Vision Platform provides set of powerful APIs for  adding feedback loop which improves the image classifier's prediction capabilities by using the model's misclassified images to train new versions of the model. Why we need Feedback loop? In today's world every industry uses feedback loo...