MLOps Basics [Week 9]: Prediction Monitoring - Kibana

What is the need of monitoring?

Monitoring systems can help give us confidence that our systems are running smoothly and, in the event of a system failure, can quickly provide appropriate context when diagnosing the root cause.

Things we want to monitor during and training and inference are different. During training we are concered about whether the loss is decreasing or not, whether the model is overfitting, etc.

But, during inference, We like to have confidence that our model is making correct predictions.

There are many reasons why a model can fail to make useful predictions:

The underlying data distribution has shifted over time and the model has gone stale. i.e inference data characteristics is different from the data characteristics used to train the model.
The inference data stream contains edge cases (not seen during model training). In this scenarios model might perform poorly or can lead to errors.
The model was misconfigured in its production deployment. (Configuration issues are common)

In all of these scenarios, the model could still make a successful prediction from a service perspective, but the predictions will likely not be useful. Monitoring machine learning models can help us detect such scenarios and intervene (e.g. trigger a model retraining/deployment pipeline).

The scope of this post to understand the basics of monitoring predictions.

There are different tools available for monitoring:

and many more...

I will be using Kibana.

In this post, I will be going through the following topics:

Basics of Cloudwatch Logs
Creating Elastic Search Cluster
Configuring Cloudwatch Logs with Elastic Search
Creating Index Patterns in Kibana
Creating Kibana Visualisations
Creating Kibana Dashboard

Basics of Cloudwatch Logs

What is Cloudwatch logs?

Amazon CloudWatch Logs is a service that collects and stores logs from your application and infrastructure running on AWS, provides the same features expected of any log management tool: real-time monitoring, searching and filtering, and alerts.

Let's see how to check the logs for the lambda we implemented in the last week.

Go to the MLOps-Basics Lambda. Navigate to the Monitor section and Logs part. There will be a button indicating View logs in cloudwatch. Click on that.

Now a new window will open (Cloudwatch) which contains the logs of the Lambda. This is the Log group corresponding to the lambda. Click on the top one.

The logs corresponding to the latest stream will be visible as follows.

It will be hard to monitor the predictions using these logs. Having a dashboard containing the predictions and counts will be helpful to monitor. Let's see how to stream these logs to Kibana and visualise there.

Creating Elastic Search Cluster

Let's create an Elastic Search Cluster which will be used to stream the logs.

Sign in to the AWS Management Console and open the Elasticsearch Service at https://console.aws.amazon.com/es/home. Create a New Domain
Choose the domain type as Development and Testing (change according to the needs)

Give the domain a name. mlops-cluster

Choosing the instance type as t2.small.elasticsearch (since this is for demo).

Choosing the Network configuration as Public access so that it can be shared across different people easily. (With VPC also it can be shared but it requires some more configuration.)

Get the ip of the machine using this link and then add that in the domain access policy.

Since the instance chosen is t2.small it does not support https encryption. Deselect that option.

After reviewing everything create the instance. This will take some time (5mins +). Once the cluster is created, the status will be shown as Active.

Configuring Cloudwatch Logs with Elastic Search

Creating a IAM role with necessary permissions

In order to stream logs to Elasticsearch cluster, Cloudwatch should have necessary permissions to write to ES Cluster. Let's a create a role with the permissions required.

Go to the IAM console and Roles section. Click on Create Role

Select the AWS Service and the use case as Lambda

Search for esfull and select the AmazonESFullAccess policy.

Give the role a name mlops-cluster-role and save it.

Configuring Elasticsearch cluster to Cloudwatch logs

Now that the role has created, go to the cloudwatch Log Group of the lambda. Under Actions/Subscription Filters there will be Create Elasticsearch Subscription Filter. Select it.

Select mlops-cluster under the Amazon ES cluster option. Select the mlops-cluster-role IAM Execution role.

Select the Log format as JSON, since we are printing the logs in Json format. Filter patterns will help in filtering the unnecessary logs and focus on the necessary ones. As a simple usecase, let's write the filter pattern as prediction. This will filter the logs which has prediction in it.

Test the filter pattern by select one of the latest log stream and then click Test Pattern. Now in the test results you can see only the prediction related logs.

Cross check once the configuration and then create it.

Creating Index Patterns in Kibana

Now that cloudwatch is configured with Elasticsearch, let's go to Kibana dashboard. Kibana link can be accessed as below:

Go to the Discover section.

You might see page like this. In that case fire some queries so that some logs will be created.

Once few logs are there, the page will look like this. Click on Create Index Pattern

Add the index pattern as cwl-* which indicates all the Cloudwatch Logs.

Include the @timestamp field also

Now we can see prediction.label, prediction.label.keyword, prediction.score, text in the extracted fields.

Once the pattern is created, the logs will be visible in Discover

Create a data table by selecting the relevant fields. I have selected the fields prediction.label, prediction.label.keyword, prediction.score, text..