A RESTful ML Model Service

Thu 29 April 2021

Introduction

Sometimes you find yourself writing the same code over and over. When that starts happening you know it's time to take what you've learned and create a reusable piece of code that can be applied in the future. Because of the experience that we've gained in writing previous blog posts, I think that it is a good time to make a reusable service that can host any number of machine learning models.

In previous blog posts we've built many different types of services that can host ML models, in this blog post we'll aim at building a reusable service that can host an ML model behind a RESTful API. APIs are called RESTful when they follow the guidelines of the REST standard. REST stands for Representational State Transfer and is a subset of the HTTP protocol that is useful for building web applications. RESTful APIs are widely used in production systems and are an industry standard for integrating different systems.

The features that we want this reusable service to have are simple. We want to be able to install the service code as a package, that is to say through the pip python package manager. We want the API of the service to follow well-established standards, in this case we'll follow the REST standard for web APIs. We want to be able to configure the service to host any number of ML models. Lastly, we want to make the service be self-documenting, so that we don't have to create OpenAPI documentation for the service manually.

All of these things are possible and indeed easy to implement because we will be relying on a common interface for all the ML models that the service will host. This interface is the MLModel interface and it is defined in another package that we've already created. This interface and the package are fully described in a previous blog post. By requiring every model that we want to host in the service fulfill the requirements of the interface, we are able to write the service one time and reuse it.

The MLModel interface is very simple. It requires that a model class be created that contains two methods: an __init__ method that initializes the model object and a predict method that actually makes a prediction. This approach is very similar to the approach taken by Uber in their internal ML platform, they describe how they structure their ML model code here. SeldonCore is an open source project for deploying ML models which also takes a similar approach which is described here. In this blog post we will leverage the standardization that the MLModel interface makes possible to write a RESTful service that can host any model that follows the standard.

Package Structure

The service codebase will be structured into the following files:

- rest_model_service
  - __init__.py
  - configuration.py      # data models for configuration
  - generate_openapi.py   # script to generate an openapi spec
  - main.py               # entry point for service
  - routes.py             # controllers for routes
  - schemas.py            # service schemas
- tests
- requirements.txt
- setup.py
- test_requirements.txt

This structure can be seen in the github repository.

FastAPI

Now that we have a set of requirements and have described our approach, let's start building the REST service. For the web framework, we'll use the popular FastAPI framework. FastAPI is a modern framework for building web applications that uses python 3.6 and above. One of the great things about it is that it uses type hints by default, which helps to reduce the number of bugs in your code. By using the pydantic package for defining schemas, a FastAPI can generate an OpenAPI specification for your application without any extra effort. Because FastAPI supports asynchronous operations, it is also one of the fastest python web frameworks available. FastAPI is a great choice for our REST service because it follows a number of best practices by default which will raise the quality of our code. The ml_base package uses the pydantic package to define model input and output schemas which makes interfacing with FastAPI very easy.

We'll build up our understanding of how the service works by exploring the individual endpoints of the service. An endpoint is simply a point through which the service interacts with the outside world. The service has two types of endpoints: the metadata endpoint and all of the model endpoints. We'll talk about the metadata endpoint first.

Model Metadata Endpoint

The service needs to be able to expose information about the models that it is hosting to client systems. To do this, we'll add an endpoint that returns model metadata. The first thing we need to do is create the data model for the information that the endpoint will return:

class ModelMetadata(BaseModel):
  """Metadata of a model."""
  display_name: str = Field(description="The display name of the model.")
  qualified_name: str = Field(description="The qualified name of the model.")
  description: str = Field(description="The description of the model.")
  version: str = Field(description="The version of the model.")

The code above can be found here.

The ModelMetadata object represents one model that is being hosted by the service. We actually want to be able to host many models within the service, so we need to create a "collection" data model that can hold many ModelMetadata objects:

class ModelMetadataCollection(BaseModel):
  """Collection of model metadata."""

  models: List[ModelMetadata] = Field(description="A collection of model description.")

The code above can be found here.

Now that we have the data models, we can build the function that the client will interact with to get the model metadata:

async def get_models():
  try:
    model_manager = ModelManager()
    models_metadata_collection = model_manager.get_models()
    models_metadata_collection = ModelMetadataCollection(**{"models": models_metadata_collection}).dict()
    return JSONResponse(status_code=200, content=models_metadata_collection)
  except Exception as e:
    error = Error(type="ServiceError", message=str(e)).dict()
    return JSONResponse(status_code=500, content=error)

The code above can be found here.

The function does not accept any parameters because we don't need to select any specific model, we want to return metadata about all of the models. The first thing the function does is instantiate the ModelManager singleton. The ModelManager is a simple utility that we use to manage model instances, we described how it operates in a previous blog post. The ModelManager object should already contain instances of models, and by calling the get_models() method, we can get the metadata that we will return to the client.

The model_metadata_collection object is instantiated using the data model we created above, and returned as a JSONResponse to the client. If anything goes wrong, the function catches the exception object and returns a JSONResponse with the error details and a 500 status code.

Prediction Endpoint

To enable the service to host many instances of models, the code for the prediction endpoint needs to be a bit more complex than the metadata endpoint. We'll use a class instead of a function to create the controller for the endpoint:

class PredictionController(object):
  def __init__(self, model: MLModel) -> None:
    self._model = model

The code above can be found here.

The class is initialized with a reference to the instance of the model that it will be hosting. In this way, we can instantiate one controller object for each model that is living inside of the model service. To make predictions with the model, we'll add a method:

def __call__(self, data):
  try:
    prediction = self._model.predict(data).dict()
    return JSONResponse(status_code=200, content=prediction)
  except MLModelSchemaValidationException as e:
    error = Error(type="SchemaValidationError", message=str(e)).dict()
    return JSONResponse(status_code=400, content=error)
  except Exception as e:
    error = Error(type="ServiceError", message=str(e)).dict()
    return JSONResponse(status_code=500, content=error)

The code above can be found here.

The method is a dunder method named "__call__". This type of dunder method makes an object instantiated from the class behave like a function, which means that once we instantiate it, we'll be able to register it as an endpoint on the service.

The method is pretty simple, it takes the data object and sends it to the model to make a prediction. It then returns a JSONResponse that contains the prediction and a 200 status code. This response will be returned by the service if everything goes well. If the model raises an MLModelSchemaValidationException, then the method will return a JSONResponse with the 400 status code. For any other exceptions the method will return a 500 status code.

In the next section we'll see how this class is instantiated in order to allow the service to host any number of MLModel instances. We'll also see how we use the input and output models provided by each model object to create the documentation automatically.

Application Startup

At startup, the service does not know anything about which models it will be hosting, so it needs to load a configuration file to find out. In the main.py file, the configuration file is loaded from disk with this code:

if os.environ.get("REST_CONFIG") is not None:
  file_path = os.environ["REST_CONFIG"]
else:
  file_path = "rest_config.yaml"

if path.exists(file_path) and path.isfile(file_path):
  with open(file_path) as file:
    configuration = yaml.full_load(file)
    configuration = Configuration(**configuration)
    app = create_app(configuration.service_title, configuration.models)
else:
  raise ValueError("Could not find configuration file '{}'.".format(file_path))

The code above can be found here.

The default configuration file path is "rest_config.yaml" which is used if no other path is provided to the service. To provide an alternative path, we can set it in the "REST_CONFIG" environment variable. Once we have the yaml file loaded, we can call the create_app() function which creates the FastAPI application object.

def create_app(service_title: str, models: List[Model]) -> FastAPI:
  app: FastAPI = FastAPI(title=service_title, version=__version__)
  app.add_api_route("/",
    get_root,
    methods=["GET"])
  app.add_api_route("/api/models",
    get_models,
    methods=["GET"],
    response_model=ModelMetadataCollection,
    responses={
      500: {"model": Error}
    })

The code above can be found here.

The create_app() function first creates the app object with the service title that we loaded from the configuration file and the version. We then add two routes to the app: the root route and the model metadata route. The root route simply reroutes the request to the /docs route which hosts the auto-generated documentation. The model metadata route returns metadata for all of the models hosted by the service.

The next thing the function does is actually load the models:

model_manager = ModelManager()

for model in models:
  model_manager.load_model(model.class_path)

if model.create_endpoint:
  model = model_manager.get_model(model.qualified_name)
  controller = PredictionController(model=model)
  controller.__call__.__annotations__["data"] = model.input_schema
  app.add_api_route("/api/models/{}/prediction".format(model.qualified_name),
    controller,
    methods=["POST"],
    response_model=model.output_schema,
    description=model.description,
    responses={
      400: {"model": Error},
      500: {"model": Error}
    })
else:
  logger.info("Skipped creating an endpoint for model:{}".format(model.qualified_name))
return app

The code above can be found here.

The first thing we do is instantiate the ModelManager singleton. Next, we'll process each model in the configuration. For each model, we'll load it into the ModelManage and then create an endpoint for it. An endpoint is only created for a model if the configuration sets the "create_endpoint" option to true for that model.

Creating an endpoint for a model is a little tricky because we need to dynamically create an endpoint and add all of the options that FastAPI supports.

To create an endpoint for a model, we first need to get a reference to the model from the ModelManager singleton. We then instantiate the PredictionController class and pass the reference to the model to the __init__() method of the class. We now have a function that we can register with the FastAPI application as an endpoint controller. Before we can do that, we need to add an annotation to the function that will allow FastAPI to automatically create documentation for the endpoint. We'll annotate the controller function with the pydantic type that the model accepts as input. Now we are ready to register the function as a controller, when we do that we also provide the FastAPI app with the HTTP method, response pydantic model, description, and error response models. All of these options give the FastAPI app information about the endpoint which will be used later to auto-generate the documentation.

Creating a Package

This service will be most useful when it can be "added on" to a model project so that it can provide the deployment functionality for a machine learning model without becoming part of the codebase. If we take this approach, then the rest_model_service package is installed in the python environment and it will live as a dependency of the ml model package.

To enable all of this, the rest_model_service package is available as a package that can be installed from PyPi using the pip package manager. To install the package into your project you can execute this command:

pip install rest_model_service

Once the service package is installed, we can use it within an ML model project to create a RESTful service for the model.

Using the Service

In order to try out the service we'll need a model that follows the MLModel interface. There is a simple mocked model in the tests.mocks module that we'll use to try out the service:

class IrisModelInput(BaseModel):
  sepal_length: float = Field(gt=5.0, lt=8.0, description="Length of the sepal of the flower.")
  sepal_width: float = Field(gt=2.0, lt=6.0, description="Width of the sepal of the flower.")
  petal_length: float = Field(gt=1.0, lt=6.8, description="Length of the petal of the flower.")
  petal_width: float = Field(gt=0.0, lt=3.0, description="Width of the petal of the flower.")


class Species(str, Enum):
  iris_setosa = "Iris setosa"
  iris_versicolor = "Iris versicolor"
  iris_virginica = "Iris virginica"


class IrisModelOutput(BaseModel):
  species: Species = Field(description="Predicted species of the flower.")


class IrisModel(MLModel):
  display_name = "Iris Model"
  qualified_name = "iris_model"
  description = "Model for predicting the species of a flower based on its measurements."
  version = "1.0.0"
  input_schema = IrisModelInput
  output_schema = IrisModelOutput

  def __init__(self):
    pass

  def predict(self, data):
    return IrisModelOutput(species="Iris setosa")

The code above can be found here.

The mock model class works just like any other MLModel class, but it always returns a prediction of "Iris setosa". As you can see, the model references the IrisModelInput and IrisModelOutput pydantic models for its input and output.

Once we have a model, we'll need a configuration file to your project that will be used by the model service to find the models that you want to deploy. The configuration file should look like this:

service_title: REST Model Service
models:
  - qualified_name: iris_model
    class_path: tests.mocks.IrisModel
    create_endpoint: true

This file can be found in the examples folder here.

To start up the service locally, we need to point the service at the configuration file using an environment variable and then execute the uvicorn command:

export PYTHONPATH=./
export REST_CONFIG=examples/rest_config.yaml
uvicorn rest_model_service.main:app --reload

The service should start and we can view the documentation page on port 8000:

Documentation

As you can see, the root endpoint and model metadata endpoint are part of the API. We also have an automatically generated endpoint for the iris_model mocked model that we added to the service through the configuration. The model's input and output data models are also added to documentation:

Model Endpoint

We can even try a prediction out:

Prediction

Of course, the prediction will always be the same because it's a mocked model.

Generating the OpenAPI Contract

The FastAPI application actually generates the OpenAPI service specification at runtime and it is available for download from the documentation page. However, we'd like to generate the specification and save it to a file in source control. To do this we can use a script provided by the rest_model_service package called "generate_openapi". The script is installed with the package and is registered to be used within the environment where the package is installed. Here is how to use it:

export PYTHONPATH=./
export REST_CONFIG=examples/rest_config.yaml
generate_openapi --output_file=example.yaml

The script uses the same configuration that the service uses, but it doesn't run the webservice. It instead uses the FastAPI framework to generate the contract and saves it to the output file.

The generated contract will look like this:

info:
  title: REST Model Service
  version: <version_placeholder>
openapi: 3.0.2
paths:
  /:
    get:
      description: Root of API.
      operationId: get_root__get
      responses:
        '200':
          content:
            application/json:
              schema: {}
          description: Successful Response
      summary: Get Root
  /api/models:
    get:
      description: List of models available.
...

Closing

In this blog post we've shown how to create a web service that is easy to install, configure and deploy that is able to deploy any machine learning model that we throw at it. By using the MLModel base class, any model can be made to work with the service. When deploying machine learning models to production systems, it's a common practice to create a custom service that "wraps" around the model code and creates an interface that other systems can use to access the model. With the approach described in this blog post, the service is created automatically by using the interface definition provided by the model itself. Furthermore, the documentation is also created automatically by using the tooling provided by FastAPI. Lastly, we've made the service easy to add to any project by putting the package into the Pypi repository, from where it can be installed by using a simple "pip install" command.

The service currently does not allow any extra code that is not model code to be hosted by the service. When deploying a model into a production setting, we often have extra logic that we need to deploy alongside the model that is not technically part of the model. This is usually called the "business logic" of the solution. The service currently does not support the ability to add the business logic alongside the model logic. Granted, it is possible to throw the business logic into the model class and just deploy, but this combines the code together into one class and it makes it harder to test the code and reason about it correctly. To fix this shortcoming, we can add "plugin points" that allow us to add our own logic before and after the model executes where we can add the business logic.

One of the ways in which we could improve the service in the future is to allow more configuration of the models when they are instantiated by the service.It's not possible to customize the model when it is created by the service at startup time right now. In this future, it would be nice to allow the configuration of the service to hold parameters that would be passed to the model classes when they are instantiated.

social