Other articles

  1. Health Checks for ML Model Deployments

    Sun 15 January 2023

    Deploying machine learning models in RESTful services is a common way to make the model available for use within a software system. RESTful services are the most common type of service deployed, since they are very simple to build, have wide compatibility, and have lots of tooling available for them. In order to monitor the availability of the service, RESTful APIs often provide health check endpoints which make it easy for an outside system to verify that the service is up and running. A health check endpoint is a simple endpoint that can be called by a process manager to ascertain whether the application is running correctly. In this blog post we'll be working with Kubernetes so we'll focus on the health checks supported by Kubernetes.

    read more
  2. Policies for ML Models

    Wed 21 September 2022

    Machine learning models are being used to make ever more important decisions in the modern world. Because of the power of data modeling, ML models are able to learn the nuances of a domain and make accurate predictions even in situations where a human expert would not be able to. However, ML models are not omniscient and they should not run without oversight from their operators. To handle situations in which we don't want to have an ML model make predictions, we can create a policy that steps in before the prediction is returned to the user. A policy that is applied to an ML model is simply a rule that ensures that the model will never make predictions that are unsafe to use. For example, we can create a policy that make sure that a machine learning model that makes predictions about optimal airline ticket prices never makes predictions that cost the airline money. A good policy for an ML model is one that allows the model some leeway while also ensuring that the model’s predictions are safe to use. In this blog post, we'll write policies for ML models and deploy the policies alongside the model using the decorator pattern.

    read more
  3. Load Tests for ML Models

    Thu 01 September 2022

    In a previous blog post we showed how to create a RESTful model service for a machine learning model that we want to deploy. A common requirement for RESTful services is to be able to be able to continue working while being used by many users at the same time. In this blog post we'll show how to create a load testing script for an ML model service.

    read more
  4. Caching for ML Model Deployments

    Wed 10 August 2022

    In a software system, a cache is a data store that is used to temporarily store computation results or frequently-accessed data. When accessing the results of a computation from a cache, we are able to avoid paying the cost of recomputing the result. When accessing a frequently accessed piece of data we are able to avoid paying the cost of accessing the data from a slower data store. This type of caching is used when accessing data from a slower data store than the cache. When a cache hit occurs, the data being sought is found and returned to the caller. When a β€œmiss” occurs, the data is not found and must be recomputed or accessed from the slower data store by the caller. A data cache is generally built using storage that has low latency, which means that it is more expensive to run. Machine learning model deployments can benefit from caching because making predictions with a model can be a CPU-intensive process, especially for large and complex models. Predictions that take a long time to make can be cached and returned later when the same prediction is requested. This type of caching is also known as memoization. Another reason that a prediction can take a long time to create is if data enrichment is needed. Data enrichment is the process of adding fields to a model's input from a data store before a prediction is made, this process can add latency to the prediction and can benefit from caching.

    read more

Page 1 / 5 »