Cortex: Open source ML model serving infrastructure

May 13, 2020, 10:38 a.m. By: Merlyn Shelley


Are you curious to reduce the response latency while deploying your trained machine learning model?

Are you struggling hard with the run-time challenges like memory capacity, concurrent users, while uploading the machine learning model from your computer system to cloud services?

What if I tell you we have got a cost-effective infrastructure for massive production workloads!

Yup, we got a powerful solution. That eases the problems faced by data scientists and machine learning developers while executing their model on cloud infrastructure.

Yup, it is Cortex, the complete machine learning model servicing infrastructure. Developed by Cortex Lab, California based early-stage startup.

Now, Let's have a detailed overview of Cortex

Cortex is an open-source platform designed for deploying trained machine learning models directly as a web service in production.

Yup, Cortex makes it very easy to execute the real-time inference at scale. Cortex has got an essential support mechanism to implement a trained machine learning model without much hassle. That includes autoscaling, GPU and CPU support, multiple frameworks, spot instances and in enabling rapid iteration to reduce the downtime. Cortex doesn't require extended configuration, making it more flexible to launch and keep running.

Sounds interesting? Then, read through this precise information on Cortex to the very end.

Let's look into the Key features of Cortex:

Multiple frameworks: Cortex can be deployed in all Python-based machine learning frameworks like TensorFlow, PyTorch, scikit-learn, Keras and other models. It has got the utmost compatibility that works with all deployment infrastructures.

Autoscaling: Cortex automatically scales the prediction APIs to manage the ups and downs of the production workloads. Autoscaling feature dramatically reduces the latency rate.

CPU / GPU support: Cortex web infrastructure services are so designed to run inferences seamlessly on CPU with basic models and on GPU with massive deep learning models that require quick API responses in real-time with the best end-user experience.

Spot instances: Spot instance is a discount facility at AWS while selling the spare capacity with a caveat. As Cortex is built with high fault tolerance, it can easily manage the cluster, uptime and reliability of the APIs. This helps in saving a lot of bucks with spot instances.

Rolling updates: Cortex helps in the transition of the updated model to the deployed APIs in the web service without any downtime.

Log streaming: Cortex would always maintain the logs from deployed models and further stream them directly to our Command Line Interface, with a typical docker-like syntax. This way, we can cross verify the requested payloads to that of the model's input.

Prediction monitoring: Cortex web service has to be monitored for sure. We need to track the APIs performances continuously so that we can ascertain that the models are performing as per the expected output.

Limited Configuration: Cortex installation and deployment configurations are straightforward and flexible. It is established as a simple YAML file. cluster.yaml files would create predictable clusters, and cortex.yaml files would create predictable model deployments. That's all about the configuration.

Now let's know about how Cortex was Created to work so effectively!

According to the founders of Cortex Lab, they catch up with the idea of developing a uniform API to deploy the machine learning models quickly over the cloud. For that, they took all the open-source tools like Tensorflow, Docker and Kubernetes. Then they combined all of them with the AWS service like CloudWatch, EKS(Elastic Kubernetes Service) and S3 (Simple Storage Service) to achieve a single API to deploy any machine learning models.

Website reference:

Github reference: