Health Checks for ML Model Deployments

Sun 15 January 2023

Health Checks for ML Model Deployments

In a previous blog post we showed how to create a RESTful model service for a machine learning model. In this blog post, we'll extend the model service API by adding health checks to it.

This blog post was written in a Jupyter notebook, some of the code and commands found in it reflect this.

All of the code for this blog post is in this github repository.


Deploying machine learning models in RESTful services is a common way to make the model available for use within a software system. In general, RESTful services are the most common type of service deployed, since they are simple to build, have wide compatibility, and have lots of tooling available for them. In order to monitor the availability of the service, RESTful APIs often provide health check endpoints which make it easy for an outside system to verify that the service is up and running. A health check endpoint is a simple endpoint that can be called by a process manager to ascertain whether the application is running correctly. In this blog post we'll be working with Kubernetes so we'll focus on the health checks supported by Kubernetes.

There are several types of health check endpoints supported by Kubernetes: startup, readiness, and liveness health checks. Startup checks are used to check if an application has started. If the container has a startup check configured on it, Kubernetes will wait until the application has finished starting up before moving on with the process of making the application available to clients. Startup checks are useful for applications that take a while to startup. Startup checks are only called during application startup, once an application has finished starting up the startup check is not called again.

Readiness checks are used to check if a container is ready to start accepting requests. Once the application has finished starting up, Kubernetes uses the readiness check to make sure that the application is able to accept requests. Service readiness can change during the service lifecycle so the check is called continuously until the application is stopped.

Liveness checks are used to restart a pod if the application is not responding. They are are the simplest type of check to implement in the application because they should always succeed if the server process is running. Liveness checks are useful to detect if the application is in an unsafe state, if the liveness check fails the process manager needs to restart the application to get it out of the unsafe state. Liveness checks are also called continuously while the application is running.

In this blog post, we’ll be adding startup, readiness, and liveness checks to a RESTful model service that is hosting a machine learning model. We'll also build a model that requires health checks in order to be deployed correctly.

Getting Data

In order to train a model, we first need to have a dataset. We went into Kaggle and found a dataset that contained loan information. To make it easy to download the data, we installed the kaggle python package. Then we executed these commands to download the data and unzip it into the data folder in the project:

from IPython.display import clear_output
!mkdir -p ../data

!kaggle datasets download -d ranadeep/credit-risk-dataset -p ../data --unzip


Let's look at the data files:

!ls -la ../data
total 67648
drwxr-xr-x   6 brian  staff       192 Nov 21 14:45 .
drwxr-xr-x  27 brian  staff       864 Jan 11 21:23 ..
-rw-r--r--@  1 brian  staff      6148 Oct 31 17:30 .DS_Store
-rw-r--r--   1 brian  staff     20995 Nov 21 13:49 LCDataDictionary.xlsx
-rw-r--r--   1 brian  staff  34603008 Nov 21 14:58
drwxr-xr-x   3 brian  staff        96 Nov 16 09:33 loan

Looks like the data is in a .csv file in the /loan folder and the data dictionary is in an excel spreadsheet file.

Let's load the data .csv file into a pandas dataframe.

import pandas as pd

data = pd.read_csv("../data/loan/loan.csv", low_memory=False)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 887379 entries, 0 to 887378
Data columns (total 74 columns):
 #   Column                       Non-Null Count   Dtype  
---  ------                       --------------   -----  
 0   id                           887379 non-null  int64  
 1   member_id                    887379 non-null  int64  
 2   loan_amnt                    887379 non-null  float64
 3   funded_amnt                  887379 non-null  float64
 4   funded_amnt_inv              887379 non-null  float64
 5   term                         887379 non-null  object 
 6   int_rate                     887379 non-null  float64
 7   installment                  887379 non-null  float64
 8   grade                        887379 non-null  object 
 9   sub_grade                    887379 non-null  object 
 10  emp_title                    835917 non-null  object 
 11  emp_length                   842554 non-null  object 
 12  home_ownership               887379 non-null  object 
 13  annual_inc                   887375 non-null  float64
 14  verification_status          887379 non-null  object 
 15  issue_d                      887379 non-null  object 
 16  loan_status                  887379 non-null  object 
 17  pymnt_plan                   887379 non-null  object 
 18  url                          887379 non-null  object 
 19  desc                         126028 non-null  object 
 20  purpose                      887379 non-null  object 
 21  title                        887227 non-null  object 
 22  zip_code                     887379 non-null  object 
 23  addr_state                   887379 non-null  object 
 24  dti                          887379 non-null  float64
 25  delinq_2yrs                  887350 non-null  float64
 26  earliest_cr_line             887350 non-null  object 
 27  inq_last_6mths               887350 non-null  float64
 28  mths_since_last_delinq       433067 non-null  float64
 29  mths_since_last_record       137053 non-null  float64
 30  open_acc                     887350 non-null  float64
 31  pub_rec                      887350 non-null  float64
 32  revol_bal                    887379 non-null  float64
 33  revol_util                   886877 non-null  float64
 34  total_acc                    887350 non-null  float64
 35  initial_list_status          887379 non-null  object 
 36  out_prncp                    887379 non-null  float64
 37  out_prncp_inv                887379 non-null  float64
 38  total_pymnt                  887379 non-null  float64
 39  total_pymnt_inv              887379 non-null  float64
 40  total_rec_prncp              887379 non-null  float64
 41  total_rec_int                887379 non-null  float64
 42  total_rec_late_fee           887379 non-null  float64
 43  recoveries                   887379 non-null  float64
 44  collection_recovery_fee      887379 non-null  float64
 45  last_pymnt_d                 869720 non-null  object 
 46  last_pymnt_amnt              887379 non-null  float64
 47  next_pymnt_d                 634408 non-null  object 
 48  last_credit_pull_d           887326 non-null  object 
 49  collections_12_mths_ex_med   887234 non-null  float64
 50  mths_since_last_major_derog  221703 non-null  float64
 51  policy_code                  887379 non-null  float64
 52  application_type             887379 non-null  object 
 53  annual_inc_joint             511 non-null     float64
 54  dti_joint                    509 non-null     float64
 55  verification_status_joint    511 non-null     object 
 56  acc_now_delinq               887350 non-null  float64
 57  tot_coll_amt                 817103 non-null  float64
 58  tot_cur_bal                  817103 non-null  float64
 59  open_acc_6m                  21372 non-null   float64
 60  open_il_6m                   21372 non-null   float64
 61  open_il_12m                  21372 non-null   float64
 62  open_il_24m                  21372 non-null   float64
 63  mths_since_rcnt_il           20810 non-null   float64
 64  total_bal_il                 21372 non-null   float64
 65  il_util                      18617 non-null   float64
 66  open_rv_12m                  21372 non-null   float64
 67  open_rv_24m                  21372 non-null   float64
 68  max_bal_bc                   21372 non-null   float64
 69  all_util                     21372 non-null   float64
 70  total_rev_hi_lim             817103 non-null  float64
 71  inq_fi                       21372 non-null   float64
 72  total_cu_tl                  21372 non-null   float64
 73  inq_last_12m                 21372 non-null   float64
dtypes: float64(49), int64(2), object(23)
memory usage: 501.0+ MB

We'll be predicting credit risk. Let's select the most promising columns in the dataset so that we wont have to deal with all of the columns.

data = data[[

To make the data easier to explore we'll rename the columns to make their names more user friendly.

data = data.rename(columns={
    "annual_inc": "AnnualIncome", 
    "collections_12_mths_ex_med": "CollectionsInLast12Months",
    "delinq_2yrs": "DelinquenciesInLast2Years", 
    "dti": "DebtToIncomeRatio", 
    "emp_length": "EmploymentLength", 
    "home_ownership": "HomeOwnership", 
    "acc_now_delinq": "NumberOfDelinquentAccounts", 
    "installment": "MonthlyInstallmentPayment",
    "int_rate": "InterestRate", 
    "last_pymnt_amnt": "LastPaymentAmount",  
    "loan_amnt": "LoanAmount",
    "loan_status": "LoanStatus", 
    "pub_rec": "DerogatoryPublicRecordCount", 
    "purpose": "LoanPurpose",
    "revol_util": "RevolvingLineUtilizationRate", 
    "term": "Term", 
    "total_pymnt_inv": "TotalPaymentsToDate", 
    "verification_status": "VerificationStatus",

We'll also build a simple data dictionary with the column descriptions that were downloaded with the dataset.

data_dictionary = {
    "AnnualIncome": "The self-reported annual income provided by the borrower during registration.", 
    "CollectionsInLast12Months": "Number of collections in 12 months excluding medical collections.",
    "DelinquenciesInLast2Years": "The number of 30+ days past-due incidences of delinquency in the borrower's credit file for the past 2 years.", 
    "DebtToIncomeRatio": "A ratio calculated using the borrower’s total monthly debt payments on the total debt obligations, excluding mortgage and the requested LC loan, divided by the borrower’s self-reported monthly income.", 
    "EmploymentLength": "Employment length in years. Possible values are between 0 and 10 where 0 means less than one year and 10 means ten or more years.", 
    "HomeOwnership": "The home ownership status provided by the borrower during registration. Our values are: RENT, OWN, MORTGAGE, OTHER.",
    "NumberOfDelinquentAccounts": "The number of accounts on which the borrower is now delinquent.", 
    "MonthlyInstallmentPayment": "The monthly payment owed by the borrower if the loan originates.", 
    "InterestRate": "Interest Rate on the loan.", 
    "LastPaymentAmount": "Last total payment amount received.",
    "LoanAmount": "The listed amount of the loan applied for by the borrower.", 
    "LoanStatus": "Current status of the loan.",
    "DerogatoryPublicRecordCount": "Number of derogatory public records.",
    "LoanPurpose": "A category provided by the borrower for the loan request.", 
    "RevolvingLineUtilizationRate": "Revolving line utilization rate, or the amount of credit the borrower is using relative to all available revolving credit.", 
    "Term": "The number of payments on the loan. Values are in months and can be either 36 or 60.",
    "TotalPaymentsToDate": "Payments received to date for portion of total amount funded by investors.", 
    "VerificationStatus": "Indicates if income was verified.", 

Building a Model

Now that we have the dataset, we'll start working on training a model. We'll be doing data exploration, data preparation, model training, and model validation.

Profiling the Data

Profiling the data can tell us a lot about the internal structure of the dataset. To profile the data, we'll use the pandas_profiling package.

To install the package, we'll execute this command:

!pip install pandas_profiling


The profile is built with this code:

from pandas_profiling import ProfileReport

profile = ProfileReport(data, 
                        title="Credit Risk Analysis Dataset Report",
                            "description": "Lending Club loan data, complete loan data for all loans issued through the 2007-2015."
                            "descriptions": data_dictionary

We passed the data dictionary we built to the profile, it will be saved in the report generated.

Once the report is created, we'll save it to disk as an HTML file.



Right away the profile will tell us a few key details about the dataset:

Data Overview

The profile also contains a few alerts about the data:

Data Alerts

The profile has a description for each variable. Here's the description for the "LoanStatus" variable, which we will use to build the target variable.

Loan Status Variable Description

By using the pandas_profiling package we can avoid writing the most common data analysis code that we write for all datasets.

Preparing the Data

The column that we're interested in is the "LoanStatus" column which tells us the current status of the loan. The values in the column are:

['Fully Paid',
 'Charged Off',
 'Late (31-120 days)',
 'In Grace Period',
 'Late (16-30 days)',
 'Does not meet the credit policy. Status:Fully Paid',
 'Does not meet the credit policy. Status:Charged Off',

We'll be using this column to predict how risky a loan is. To do this, we'll need to create a target column that maps the values above into values that represent the riskiness of the loan. To keep things simple we'll simply create two categories for loans:

  • "safe", for all loans that are current, fully paid off, in grace period, or the payment plan has not started yet
  • "risky", for all other loans

Now we'll map the values above to the categories we want:

data["LoanRisk"] = data["LoanStatus"].replace({
    "Fully Paid": "safe",
    "Charged Off": "risky",
    "Current": "safe",
    "Default": "risky",
    "Late (31-120 days)": "risky",
    "In Grace Period": "safe",
    "Late (16-30 days)": "risky",
    "Does not meet the credit policy. Status:Fully Paid": "safe",
    "Does not meet the credit policy. Status:Charged Off": "risky",
    "Issued": "safe"

(887379, 19)

Now we can remove the "LoanStatus" column since we created a new target column.

data = data.drop("LoanStatus", axis=1)

(887379, 18)

Now that have a defined target column, we can move on to fixing some things in the dataset. From the profile we can see that there are several problems with the data that we need to fix.

The profile tells us that there are rows with missing data. To simplify the data modeling, we'll drop these rows.

data = data.dropna()

(841954, 18)

All of the rows with missing values are now gone.

The "AnnualIncome" column is highly skewed. This is due to some rows which have outlier values, for example the max value for this column is $9,500,000. We'll fix this by removing rows with outlier values in that column. We'll remove the rows with values in this column that are more than three standard deviations from the mean like this:

import numpy as np
from scipy import stats

data = data[(np.abs(stats.zscore(data["AnnualIncome"])) < 3)]

(835167, 18)

Another column in the dataset that is highly skewed is "DebtToIncomeRatio". For example, the maximum value in this column is 9999 which is probably not correct since most of the values in the column have a range between 0 and 100.

We'll use the same code to remove the outlier values for DebtToIncomeRatio.

data = data[(np.abs(stats.zscore(data["DebtToIncomeRatio"])) < 3)]

(835112, 18)

The column "NumberOfDelinquentAccounts" is highly skewed because of a single record that has a value of 14. We'll remove the outliers by simply filtering out the rows with values above 6.

data = data[data["NumberOfDelinquentAccounts"] <= 6]

(835111, 18)

The "HomeOwnership" column has several values that stand in for missing data. These values make up a small portion of the dataset, so we'll just remove the rows instead of doing data imputation.

data = data[data["HomeOwnership"] != "OTHER"]
data = data[data["HomeOwnership"] != "NONE"]
data = data[data["HomeOwnership"] != "ANY"]

(834889, 18)

Looks like we only lost a few hundred rows from the dataset.

The variable "CollectionsInLast12Months" is not highly skewed but it contains values that only appear once or very few times. There are very few samples that have a value above 5, these samples are likely not useful so we'll remove them.

data = data[data["CollectionsInLast12Months"] <= 5]

(834884, 18)

The same is true for the "DelinquenciesInLast2Years" and "DerogatoryPublicRecordCount" variables. There are very few samples with a value above 10 for these variables. We'll remove those samples.

data = data[data["DelinquenciesInLast2Years"] <= 10]
data = data[data["DerogatoryPublicRecordCount"] <= 10]

(834415, 18)

The variable "RevolvingLineUtilizationRate" is a percent whose values must be between 0 and 100. There really isn't a way to use more than 100% of your revolving line of credit. However, this variable has values above 100, we'll remove those samples because they're bad data.

data = data[data["RevolvingLineUtilizationRate"] <= 100.0]

(831103, 18)

Validating the Data

Next, we'll use the deepchecks package to do ML specific checks on the data. These checks are for checking for data issues that might affect an ML model.

Let's install the package:

!pip install deepchecks


Before we can run these checks, we need to specify which variables are categorical and numerical, and which variable is the target variable. We'll create lists of variable names for this purpose.

categorical_variables = [

numerical_variables = [

target_variable = "LoanRisk"

all_variables = categorical_variables + numerical_variables + [target_variable]

assert len(all_variables) == len(data.columns), "A column is missing from the lists of variables."

The assert statement at the end of the code above is a simple check that makes sure that we don't forget to include all of the columns in the lists. If we forget to include a column in the list it will stop execution of the notebook with an error message.

In order to avoid some errors in the training process, we'll need to change the column type of some of the columns in the pandas dataframe to "category".

for column_name in categorical_variables:
    data[column_name] = data[column_name].astype("category")

Lets start the data validation checks.

from deepchecks.tabular import Dataset
from deepchecks.tabular.suites import data_integrity

dataset = Dataset(data, 


The Dataset object contains a reference to the original Dataframe that we've been working with, and also contains the metadata about the columns in the dataframe that is needed to analyze the data.

We'll run the checks on the data like this:

data_integrity_suite = data_integrity()

suite_result =


The checks done by this suite are geared towards datasets used for machine learning.

The results of the data integrity suite look like this:

Data Integrity Suite

The suite contains many checks that execute on the data set, the checks that passed are:

  • Feature Label Correlation, predictive power score is less than 0.8 for all features.
  • Single Value in Column, column does not contain only a single value
  • Special Characters, ratio of samples containing solely special character is less or equal to 0.1%
  • Mixed Nulls, number of different null types is less or equal to 1
  • Mixed Data Types, rare data types in column are either more than 10% or less than 1% of the data
  • Data Duplicates, duplicate data ratio is less or equal to 0%
  • String Length Out Of Bounds, ratio of string length outliers is less or equal to 0%
  • Conflicting Labels, ambiguous sample ratio is less or equal to 0%

The checks are fully explained in the deepchecks documentation.

For now, we're more interested in the checks that did not pass:

Data Integrity Suite Fail

The two checks that didn't pass are:

  • Feature-Feature Correlation, not more than 0 pairs are correlated above 0.9
  • String Mismatch, no string variants

The deepchecks package found that the "LoanAmount" and "MonthlyInstallmentPayment" variables are highly correlated, which makes sense because an increase in loan amount will always cause an increase in payment amount. We can safely drop the MonthlyInstallmentPayment column from the dataset.

data = data.drop("MonthlyInstallmentPayment", axis=1)


(831103, 17)

The deepchecks package also found that the "EmploymentLength" variable contains string values that are similar to each other. For example, two levels found in the categorical variable are "1 year" and "< 1 year". This is a warning that we can ignore because the levels are correctly set.

We're now getting closer to a dataset that we can use to train a model . We'll be using deepchecks to do train/test dataset checks and model checks later on.

Training a Model

To train a model, we'll first create a training and testing set. We'll use 80% of the rows for training and 20% of the rows for testing.

import numpy as np

mask = np.random.rand(len(data)) < 0.80
training_data = data[mask]
testing_data = data[~mask]

(664983, 17)
(166120, 17)

We will be running a test suite on the newly created training and test suites using deepchecks. The deepchecks package requires that we create two Dataset objects, one for the training set and one for the testing set.

train_dataset = Dataset(training_data, 

test_dataset = Dataset(testing_data, 

Now we can run the train-test validation suite.

from deepchecks.tabular.suites import train_test_validation

validation_suite = train_test_validation()

suite_result =, test_dataset)


We'll save the results to files for this suite as well.


The results of the suite look like this:

Train Test Suite

All of the checks in this suite passed:

  • Datasets Size Comparison, Test-Train size ratio is greater than 0.01
  • Category Mismatch Train Test, ratio of samples with a new category is less or equal to 0%
  • Feature Label Correlation Change, Train-Test features' Predictive Power Score difference is less than 0.2
  • Feature Label Correlation Change, Train features' Predictive Power Score is less than 0.7
  • Train Test Feature Drift, categorical drift score < 0.2 and numerical drift score < 0.1
  • Train Test Label Drift, categorical drift score < 0.2 and numerical drift score < 0.1 for label drift Label's drift score Cramer's V is 0
  • New Label Train Test, number of new label values is less or equal to 0
  • String Mismatch Comparison No new variants allowed in test data
  • Train Test Samples Mix, percentage of test data samples that appear in train data is less or equal to 10%,
  • Multivariate Drift, drift value is less than 0.25

These checks are more fully explained in the documentation.

Now that we have verified the contents of the training and testing sets, we're finally ready to traing a model. To do this, we'll need to create separate dataframes for the predictor and target columns:

feature_columns = categorical_variables + numerical_variables

X_train = training_data[feature_columns]
y_train = training_data[target_variable]

X_test = testing_data[feature_columns]
y_test = testing_data[target_variable]

We'll be using the LightGBM package to train a GBM model and the FLAML package for doing automated machine learning. Let's install the packages:

!pip install lightgbm
!pip install flaml


Let's train a model using the default hyperparameters to have a baseline.

from lightgbm import LGBMClassifier

model = LGBMClassifier()

model =, y_train) 
CPU times: user 13.1 s, sys: 657 ms, total: 13.8 s
Wall time: 3.46 s

Now let's calculate the classification metrics for this simple model:

from sklearn.metrics import classification_report

y_pred = model.predict(X_test)

print(classification_report(y_test, y_pred))
              precision    recall  f1-score   support

       risky       0.92      0.14      0.24     11456
        safe       0.94      1.00      0.97    154664

    accuracy                           0.94    166120
   macro avg       0.93      0.57      0.61    166120
weighted avg       0.94      0.94      0.92    166120

CPU times: user 8.96 s, sys: 191 ms, total: 9.15 s
Wall time: 6.52 s

The "safe" class has good metrics but the "risky" class does not. This is due to the fact that the classes are imbalanced.

We'll try to fix this issue by doing automated ML with the FLAML package. The automated hyperparameter search will hopefully find some parameters that can improve the metrics of the "risky" class.

The settings are:

  • time_budget: amount of time allowed for the auto ML algorithm to run
  • metric: the metric that should be maximized by the auto ML alogorithm
  • estimator_list: the types of estimators that can be used by FLAML, in this case we only want to try LightGBM
  • task: the type of task that the estimator should be solving
  • log_file_name: name of the log file output by the auto ML algorithm
  • seed: the random seed to be used by the auto ML algorithm
from flaml import AutoML

automl = AutoML()

settings = {
    "time_budget": 1200,
    "metric": "roc_auc",
    "estimator_list": ["lgbm"],
    "task": "classification",
    "log_file_name": "experiment.log",
    "seed": 42
}, y_train=y_train, **settings)


The hyperparameter search has found an optimal set of hyperparametes using the training set and cross validation. These are the hyperparameters found:

{'n_estimators': 10707,
 'num_leaves': 7,
 'min_child_samples': 62,
 'learning_rate': 0.24185440044608203,
 'log_max_bin': 10,
 'colsample_bytree': 0.9914098492087268,
 'reg_alpha': 2.551067627605118,
 'reg_lambda': 0.0010846951681516895}

Let's train a model using the optimal hyperparameters:

model = LGBMClassifier(**automl.best_config)

model =, y_train) 

Let's get the the classification metrics for the best model:

y_pred = model.predict(X_test)

print(classification_report(y_test, y_pred))
              precision    recall  f1-score   support

       risky       0.89      0.46      0.61     11456
        safe       0.96      1.00      0.98    154664

    accuracy                           0.96    166120
   macro avg       0.93      0.73      0.79    166120
weighted avg       0.96      0.96      0.95    166120

Validating the Model

Deepchecks is also able to validate the model with the model_evaluation suite of checks.

from deepchecks.tabular.suites import model_evaluation

evaluation_suite = model_evaluation()

suite_result =, test_dataset, model)

Packaging the Model Files

Let's serialize the best model to disk:

import joblib

joblib.dump(model, "../credit_risk_model/model_files/model.joblib")
CPU times: user 1.33 s, sys: 82.1 ms, total: 1.41 s
Wall time: 226 ms

!ls -la ../credit_risk_model/model_files
total 82824
drwxr-xr-x  8 brian  staff      256 Jan 15 20:07 .
drwxr-xr-x  8 brian  staff      256 Dec 13 11:15 ..
-rw-r--r--@ 1 brian  staff     6148 Jan 15 19:09 .DS_Store
-rw-r--r--  1 brian  staff  9426966 Jan 15 19:12 data_exploration_report.html
-rw-r--r--  1 brian  staff  7750291 Jan 15 19:15 deepchecks_data_integrity_results.html
-rw-r--r--@ 1 brian  staff  7964754 Jan 15 20:06 deepchecks_model_evaluation_results.html
-rw-r--r--  1 brian  staff  7793791 Jan 15 19:17 deepchecks_train_test_results.html
-rw-r--r--  1 brian  staff  9452707 Jan 15 20:07 model.joblib

The serialized model is 9.5 megabytes in size and took 226 milliseconds to write to disk. This is important to note because we will need to deserialize the model later in order to make predictions with it.

In the process of training this model, we created a few files. To be able to use these files later, we'll package them up and save them in a location that can be accessed by the prediction code later. We'll be using a .zip file for this purpose.

import shutil

shutil.make_archive("../credit_risk_model/model_files/1", "zip", "../credit_risk_model/model_files")

The command created a .zip file with all of the files in the model_files folder. The name of the folder is "" this is just a simple name that denotes that it is the first model trained for the credit_risk_model package.

Now that we have the model files in a .zip file, we can delete the original files from the folder:

!rm ../credit_risk_model/model_files/data_exploration_report.html
!rm ../credit_risk_model/model_files/deepchecks_data_integrity_results.html
!rm ../credit_risk_model/model_files/deepchecks_train_test_results.html
!rm ../credit_risk_model/model_files/deepchecks_model_evaluation_results.html
!rm ../credit_risk_model/model_files/model.joblib

This packaging process ensures that all of the results of the model training process end up in one archive that we can use later. All of the data and model check results are packaged along with the serialized model so its easy to review the model training process.

Making Predictions with the Model

We now have a working model that accepts Pandas dataframes as input and also returns predictions in dataframes. This is useful in the context of model training, but makes integrating the model with other software components a lot more complicated. To make the model easier to use, we'll need to create input and output schemas for the model and also create a wrapper class that provides a consistent interface for the model.

We'll create the model's input and output schemas with the pydantic package, which is a package used for data validation. By creating the schemas using this package we're able to fully document the inputs that the model accepts and the expected outputs of the model we're going to deploy.

To begin, we'll define the allowed values for the categorical variables.

from pydantic import BaseModel, Field
from enum import Enum

class EmploymentLength(str, Enum): 
    """Employment length in years."""
    less_than_1_year = "< 1 year"
    one_year = "1 year"
    two_years = "2 years"
    three_years = "3 years"
    four_years = "4 years"
    five_years = "5 years"
    six_years = "6 years"
    seven_years = "7 years"
    eight_years = "8 years"
    nine_years = "9 years"
    ten_years_or_more = "10+ years"

class HomeOwnership(str, Enum):
    """The home ownership status provided by the borrower during registration."""
    RENT = "RENT"
    OWN = "OWN"

class LoanPurpose(str, Enum):
    """A category provided by the borrower for the loan request."""
    debt_consolidation = "debt_consolidation"
    credit_card = "credit_card"
    home_improvement = "home_improvement"
    other = "other"
    major_purchase = "major_purchase"
    small_business = "small_business"
    car = "car"
    medical = "medical"
    moving = "moving"
    vacation = "vacation"
    wedding = "wedding"
    house = "house"
    renewable_energy = "renewable_energy"
    educational = "educational"

class Term(str, Enum):
    """The number of payments on the loan."""
    thirty_six_months = " 36 months"
    sixty_months = " 60 months"

class VerificationStatus(str, Enum):
    """Indicates if income was verified."""
    source_verified = "Source Verified"
    verified = "Verified"
    not_verified = "Not Verified"

The dataset contains 5 categorical variables, so we defined 5 Enum classes that contain the values accepted for these variables. Each enumeration has a key and value, with the value being the value as the model expects to see it. By using enumerated values, we can ensure that the model can only receive values in these inputs that it has previously seen in the training set.

Now that we have the categorical variables defined, we can define the input schema for the model:

class CreditRiskModelInput(BaseModel):
    """Inputs for predicting credit risk."""
    annual_income: int = Field(ge=1896, le=273000, description="The self-reported annual income provided by the borrower during registration.")
    collections_in_last_12_months: int = Field(ge=0, le=20, description="Number of collections in 12 months excluding medical collections.")
    delinquencies_in_last_2_years: int = Field(ge=0, le=39, description="The number of 30+ days past-due incidences of delinquency in the borrower's credit file for the past 2 years.")
    debt_to_income_ratio: float = Field(ge=0.0, le=42.64, description="A ratio calculated using the borrower’s total monthly debt payments on the total debt obligations, excluding mortgage and the requested LC loan, divided by the borrower’s self-reported monthly income.")
    employment_length: EmploymentLength = Field(description="Employment length in years.")
    home_ownership: HomeOwnership = Field(description="The home ownership status provided by the borrower during registration. Our values are: RENT, OWN, MORTGAGE, OTHER.")
    number_of_delinquent_accounts: int = Field(ge=0, le=6, description="The number of accounts on which the borrower is now delinquent.")
    interest_rate: float = Field(ge=5.32, le=28.99, description="Interest Rate on the loan.")
    last_payment_amount: float = Field(ge=0.0, le=36475.59, description="Last total payment amount received.")
    loan_amount: int = Field(ge=500.0, le=35000.0, description="The listed amount of the loan applied for by the borrower.")
    derogatory_public_record_count: int = Field(ge=0.0, le=86.0, description="Number of derogatory public records.")
    loan_purpose: LoanPurpose = Field(description="A category provided by the borrower for the loan request.")
    revolving_line_utilization_rate: float = Field(ge=0.0, le=892.3, description="Revolving line utilization rate, or the amount of credit the borrower is using relative to all available revolving credit.")
    term: Term = Field(description="The number of payments on the loan. Values are in months and can be either 36 or 60.")
    total_payments_to_date: float = Field(ge=0.0, le=57777.58, description="Payments received to date for portion of total amount funded by investors.")
    verification_status: VerificationStatus = Field(description="Indicates if income was verified.")

The schema is called "CreditRiskModelInput" and contains fields for each variable found in the dataset. We're using the Enum classes we defined above for the categorical fields, and we defined fields for all of the numerical variables. Each numerical field has a range of allowed values that matches the range of the numerical variable found in the dataset. Each field also has a description of the variable that helps the user of the model to correctly feed data to the model.

The process of creating an input data schema makes the model much more used friendly and exposes information found in the dataset that the model was originally trained on to the user of the model. Doing this also allows us to build documentation for the model service automatically.

Now that we have the model's input schema defined, we'll define the output schema:

class CreditRisk(str, Enum):
    """Indicates whether or not loan is risky."""
    safe = "safe"
    risky = "risky"

class CreditRiskModelOutput(BaseModel):
    credit_risk: CreditRisk = Field(description="Whether or not the loan is risky.")

The model is a classification model and the output schema simply enumerates the classes that the model can predict.

We now have the input and output schemas defined, now we can tie it all together by creating a wrapper class for the model. The ml_base package defines a simple base class for model prediction code that allows us to "wrap" the prediction code in a class that follows the MLModel interface. This interface publishes this information about the model:

  • Qualified Name, a unique identifier for the model
  • Display Name, a friendly name for the model used in user interfaces
  • Description, a description for the model
  • Version, semantic version of the model codebase
  • Input Schema, an object that describes the model\'s input data
  • Output Schema, an object that describes the model\'s output schema

The MLModel interface also dictates that the model class implements two methods:

  • __init__, the initialization method which loads any model artifacts needed to make predictions
  • predict, prediction method that receives model inputs makes a prediction and returns model outputs

By using the MLModel base class we'll be able to do more interesting things later with the model. If you'd like to learn more about the ml_base package, here is some documentation about it.

To install the ml_base package, execute this command:

!pip install ml_base


We'll define the wrapper class like this:

import sys
import os
import joblib
import pandas as pd
from ml_base import MLModel
import zipfile

__file__ = os.path.join(os.path.dirname(os.path.realpath(os.path.abspath(''))), "credit_risk_model", "prediction", "")

class CreditRiskModel(MLModel):
    """Prediction logic for the Credit Risk Model."""

    def display_name(self) -> str:
        """Return display name of model."""
        return "Credit Risk Model"

    def qualified_name(self) -> str:
        """Return qualified name of model."""
        return "credit_risk_model"

    def description(self) -> str:
        """Return description of model."""
        return "Model to predict the credit risk of a loan."

    def version(self) -> str:
        """Return version of model."""
        return "0.1.0"

    def input_schema(self):
        """Return input schema of model."""
        return CreditRiskModelInput

    def output_schema(self):
        """Return output schema of model."""
        return CreditRiskModelOutput

    def __init__(self):
        """Class constructor that loads and deserializes the model parameters."""
        dir_path = os.path.dirname(os.path.dirname(os.path.realpath(__file__)))
        file_path = os.path.join(dir_path, "model_files", "")

        with zipfile.ZipFile(file_path) as zf:
            if "model.joblib" not in zf.namelist():
                raise ValueError("Could not find model file in zip file.")
            model_file ="model.joblib")
            self._model = joblib.load(model_file)

    def predict(self, data: CreditRiskModelInput) -> CreditRiskModelOutput:
        """Make a prediction with the model.

            data: Data for making a prediction with the model.

            The result of the prediction.

        if type(data) is not CreditRiskModelInput:
            raise ValueError("Input must be of type 'CreditRisk'")

        X = pd.DataFrame([[

        categorical_variables = ["EmploymentLength", 

        for column_name in categorical_variables:
            X[column_name] = X[column_name].astype("category")

        y_hat = self._model.predict(X)[0]

        return CreditRiskModelOutput(credit_risk=CreditRisk[y_hat])

We can make a prediction with the model by first building a CreditRiskModelInput object:

model_input = CreditRiskModelInput(

Next, we'll instantiate the model class we defined above:

model = CreditRiskModel()
CPU times: user 372 ms, sys: 29.2 ms, total: 401 ms
Wall time: 113 ms

Notice that the model object took 113 milliseconds to be instantiated. This is because the model parameters take a lot of disk space and take a while to load from the hard drive. This is something that we'll need to deal with later.

We'll use the CreditRiskModelInput instance to make a prediction like this:

prediction = model.predict(model_input)

CreditRiskModelOutput(credit_risk=< 'safe'>)

The model predicted that the loan is "safe".

Creating a RESTful Service

Now that we have a model, we can deploy it in a service that allows clients to make predictions. To do this, we won't need to write any extra code, we can leverage the rest_model_service package to provide the RESTful API for the service. You can learn more about the package in this blog post.

To install the package, execute this command:

!pip install rest_model_service


To create a service for our model, all that is needed is that we add a YAML configuration file to the project. The configuration file looks like this:

service_title: Credit Risk Model Service
description: "Service hosting the Credit Risk Model."
version: "0.1.0"
  - qualified_name: credit_risk_model
    class_path: credit_risk_model.prediction.model.CreditRiskModel
    create_endpoint: true

At the root of the YAML, the "service_title" field is the name of the service as it will appear in the documentation. The "description" and "version"fields will also be used to create the service documentation.

The models field is an array that contains the details of the models we would like to deploy in the service. The "qualified_name" field is the name we gave to the model. The "class_path" field points at the MLModel class that implements the model's prediction logic, in this case it is pointing to the class we built earlier in this blog post. The "create_endpoint" field tells the service to create an endpoint for the model.

Using the configuration file, we can create an OpenAPI specification file for the model service by executing these commands:

export PYTHONPATH=./
generate_openapi --configuration_file=./configuration/rest_configuration.yaml --output_file="service_contract.yaml"

The service_contract.yaml file is generated and contains the OpenAPI specification that was generated for the model service. The specification contains a description of the model's endpoint. The model's input and output schemas are automatically extracted and added to the specification. The OpenAPI specification file generated can be found at the root of the github repository in the file named service_contract.yaml

To run the service locally, execute these commands:

export REST_CONFIG=./configuration/rest_configuration.yaml
uvicorn rest_model_service.main:app

The service process starts up and can be accessed in a web browser at The service renders the OpenAPI specification as a webpage that looks like this:

Service Documentation

By using the MLModel base class provided by the ml_base package and the REST service framework provided by the rest_model_service package we're able to quickly stand up a service to host the model.

We can make a prediction using the model running in the service with this command:

!curl -X 'POST' \
  '' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d "{ \
      \"annual_income\": 273000, \
      \"collections_in_last_12_months\": 20, \
      \"delinquencies_in_last_2_years\": 39, \
      \"debt_to_income_ratio\": 42.64, \
      \"employment_length\": \"< 1 year\", \
      \"home_ownership\": \"MORTGAGE\", \
      \"number_of_delinquent_accounts\": 6, \
      \"interest_rate\": 28.99, \
      \"last_payment_amount\": 36475.59, \
      \"loan_amount\": 35000,  \
      \"derogatory_public_record_count\": 86, \
      \"loan_purpose\": \"debt_consolidation\", \
      \"revolving_line_utilization_rate\": 892.3, \
      \"term\": \" 36 months\", \
      \"total_payments_to_date\": 57777.58, \
      \"verification_status\": \"Source Verified\" \

The model returned a prediction of "safe" for the loan.

Understanding Health Checks

The service is able exposes health information about itself through health endpoints. The health endpoints are:

  • /api/health: indicates whether the service process is running. This endpoint will return a 200 status once the service has started.
  • /api/health/ready: indicates whether the service is ready to respond to requests. This endpoint will return a 200 status only if all the models and decorators have finished being instantiated without errors.
  • /api/health/startup: indicates whether the service is started. This endpoint will return a 200 status only if all the models and decorators have finished being instantiated without errors.

These endpoints are important for our use case because our model takes a while to load and become ready to be served over the API. The service will not be ready to serve traffic for a while, so the readiness and startup checks will fail until the models are ready.

The service is running so we'll try out each endpoint with a request:

!curl -X 'GET' \
  '' \
  -H 'accept: application/json'

The health endpoint returned a status of "HEALTHY".

!curl -X 'GET' \
  '' \
  -H 'accept: application/json'

The readiness status endpoint returned a status of "ACCEPTING_TRAFFIC".

!curl -X 'GET' \
  '' \
  -H 'accept: application/json'

The startup status endpoint returned a status of "STARTED".

During normal operation, the health endpoints are not very interesting. However, in special situations they are very useful. For example, if a model takes a long time to start up, the startup check endpoint will not return a 200 status response until each model is initiated and ready to make predictions. The readiness endpoint will also not return a 200 status until the model is ready. We'll use the healtcheck endpoints to integrated with Kubernetes.

Creating a Docker Image

Now that we have a working model and model service, we'll need to deploy it somewhere. We'll start by deploying the service locally using Docker.

Let's create a docker image and run it locally. The docker image is generated using instructions in the Dockerfile:

# syntax=docker/dockerfile:1
FROM python:3.9-slim


LABEL org.opencontainers.image.title="Health Checks for ML Models"
LABEL org.opencontainers.image.description="Health checks for ML models."
LABEL org.opencontainers.image.created=$BUILD_DATE
LABEL org.opencontainers.image.authors=""
LABEL org.opencontainers.image.source=""
LABEL org.opencontainers.image.version="0.1.0"
LABEL org.opencontainers.image.licenses="MIT License"

WORKDIR /service

ARG USERNAME=service-user

# install packages
RUN apt-get update \
    && apt-get install --assume-yes --no-install-recommends sudo \
    && apt-get install --assume-yes --no-install-recommends git \
    && apt-get install -y --no-install-recommends apt-utils \
    && apt-get install -y --no-install-recommends libgomp1 \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*

# create a user
RUN groupadd --gid $USER_GID $USERNAME \
    && useradd --uid $USER_UID --gid $USER_GID -m $USERNAME \
    && echo $USERNAME ALL=\(root\) NOPASSWD:ALL > /etc/sudoers.d/$USERNAME \
    && chmod 0440 /etc/sudoers.d/$USERNAME

# installing dependencies
COPY ./service_requirements.txt ./service_requirements.txt
RUN pip install -r service_requirements.txt

# copying code and license
COPY ./credit_risk_model ./credit_risk_model


CMD ["uvicorn", "rest_model_service.main:app", "--host", "", "--port", "8000"]

The Dockerfile is used by this docker command to create a docker image:

!docker build -t credit_risk_model_service:0.1.0 ../


To make sure everything worked as expected, we'll look through the docker images in our system:

!docker image ls | grep credit_risk_model_service
credit_risk_model_service         0.1.0     fc3c3c747e2b   4 seconds ago   614MB

The credit_risk_model_service image is listed. Next, we'll start the image to see if the service is working correctly.

!docker run -d \
    -p 8000:8000 \
    -e REST_CONFIG=./configuration/rest_configuration.yaml \
    -v $(pwd)/../configuration:/service/configuration \
    --name credit_risk_model_service \

To make sure the server process started up correctly, we'll look at the logs:

!docker logs credit_risk_model_service
INFO:     Started server process [1]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on (Press CTRL+C to quit)

The logs look good and the service is up and running.

The service should be accessible on port 8000 of localhost, so we'll try to make a prediction using the curl command:

!curl -X 'POST' \
  '' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d "{ \
      \"annual_income\": 273000, \
      \"collections_in_last_12_months\": 20, \
      \"delinquencies_in_last_2_years\": 39, \
      \"debt_to_income_ratio\": 42.64, \
      \"employment_length\": \"< 1 year\", \
      \"home_ownership\": \"MORTGAGE\", \
      \"number_of_delinquent_accounts\": 6, \
      \"interest_rate\": 28.99, \
      \"last_payment_amount\": 36475.59, \
      \"loan_amount\": 35000,  \
      \"derogatory_public_record_count\": 86, \
      \"loan_purpose\": \"debt_consolidation\", \
      \"revolving_line_utilization_rate\": 892.3, \
      \"term\": \" 36 months\", \
      \"total_payments_to_date\": 57777.58, \
      \"verification_status\": \"Source Verified\" \

The model predicted that the loan is safe.

We're done with the docker container, so we'll shut down the model service container.

!docker kill credit_risk_model_service
!docker rm credit_risk_model_service

Creating a Kubernetes Cluster

To show the system in action, we’ll deploy the service to a Kubernetes cluster. A local cluster can be easily started by using minikube. Installation instructions can be found here.

To start the minikube cluster execute this command:

!minikube start
😄  minikube v1.28.0 on Darwin 13.0.1Using the docker driver based on existing profile
👍  Starting control plane node minikube in cluster minikube
🚜  Pulling base image ...
🔄  Restarting existing docker container for "minikube" ...
🐳  Preparing Kubernetes v1.25.3 on Docker 20.10.20 ...
🔎  Verifying Kubernetes components...
    ▪ Using image image image
💡  Some dashboard features require the metrics-server addon. To enable all features please run:

    minikube addons enable metrics-server

🌟  Enabled addons: storage-provisioner, default-storageclass, dashboard
🏄  Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default

Let's view all of the pods running in the minikube cluster to make sure we can connect.

!kubectl get pods -A
NAMESPACE              NAME                                        READY   STATUS    RESTARTS         AGE
kube-system            coredns-565d847f94-2v6l9                    0/1     Running   10 (3d20h ago)   11d
kube-system            etcd-minikube                               0/1     Running   10 (3d20h ago)   11d
kube-system            kube-apiserver-minikube                     0/1     Running   10 (3d20h ago)   11d
kube-system            kube-controller-manager-minikube            0/1     Running   10 (25s ago)     11d
kube-system            kube-proxy-ztbgd                            1/1     Running   10 (25s ago)     11d
kube-system            kube-scheduler-minikube                     0/1     Running   10 (25s ago)     11d
kube-system            storage-provisioner                         1/1     Running   18 (25s ago)     11d
kubernetes-dashboard   dashboard-metrics-scraper-b74747df5-x559p   1/1     Running   9 (25s ago)      11d
kubernetes-dashboard   kubernetes-dashboard-57bbdc5f89-9jvln       1/1     Running   14 (3d20h ago)   11d

The pods running the kubernetes dashboard and other cluster services appear in the kube-system and kubernetes-dashboard namespaces.

Creating a Kubernetes Namespace

Now that we have a cluster and are connected to it, we'll create a namespace to hold the resources for our model deployment. The resource definition is in the kubernetes/namespace.yaml file. To apply the manifest to the cluster, execute this command:

!kubectl create -f ../kubernetes/namespace.yaml
namespace/model-services created
resourcequota/model-services-resource-quota created

The namespace was created, alongside with a ResourceQuota which limits the amount of resources that can be taken by objects within the namespace.

To take a look at the namespaces, execute this command:

!kubectl get namespace
NAME                   STATUS   AGE
default                Active   11d
kube-node-lease        Active   11d
kube-public            Active   11d
kube-system            Active   11d
kubernetes-dashboard   Active   11d
model-services         Active   0s

The new namespace appears in the listing along with other namespaces created by default by the system. To use the new namespace for the rest of the operations, execute this command:

!kubectl config set-context --current --namespace=model-services
Context "minikube" modified.

Creating a Kubernetes Deployment and Service

The model service is deployed by using Kubernetes resources. These are:

  • Model Service ConfigMap: a set of configuration options, in this case it is a simple YAML file that will be loaded into the running container as a volume mount. This resource allows us to change the configuration of the model service without having to modify the Docker image. The configuration file will overwrite the configuration files that were included with the Docker image.
  • Deployment: a declarative way to manage a set of pods, the model service pods are managed through the Deployment. This deployment includes the model service as well as the OPA service running as a sidecar container.
  • Service: a way to expose a set of pods in a Deployment, the model services is made available to the outside world through the Service.

The Deployment resource will be created with some special options that can leverage the health endpoints of the model service. These options look like this:

    scheme: HTTP
    path: /api/health
    port: 8000
  initialDelaySeconds: 0
  periodSeconds: 5
  timeoutSeconds: 2
  failureThreshold: 5
  successThreshold: 1
    scheme: HTTP
    path: /api/health/ready
    port: 8000
  initialDelaySeconds: 0
  periodSeconds: 5
  timeoutSeconds: 2
  failureThreshold: 5
  successThreshold: 1
    scheme: HTTP
    path: /api/health/startup
    port: 8000
  initialDelaySeconds: 0
  periodSeconds: 5
  timeoutSeconds: 2
  failureThreshold: 5
  successThreshold: 1

This is not the complete YAML file, the full Deployment is defined in the ./kubernetes/model_service.yaml file.

The model service container has options defined for each type of health check. Each type of health check is configured in the same way. The options are:

  • initialDelaySeconds: This option tells Kubernetes how long to wait after container startup to start calling the health check.
  • periodSeconds: This option tells how often to call the health check endpoint.
  • timeoutSeconds: This option tell how long to wait for a response from the service before failing the health check.
  • failureThreshold: This option tells how many times the health check must fail before Kubernetes labels the container as unhealthy and restarts the pod.
  • successThreshold: This option tells how many times the health check must succeed before Kubernetes labels the container as healthy.

We decided to have Kubernetes check 5 times before labelling the container as unhealthy, with a period of 5 seconds. This means that the service has 25 seconds for the model to finish loading. This is a value we know would work with this model, based on our timing measurements above.

We're almost ready to start the model service, but before starting it we'll need to send the docker image from the local docker daemon to the minikube image cache:

!minikube image load credit_risk_model_service:0.1.0

We can view the images in the minikube cache with this command:

!minikube image ls | grep credit_risk_model_service

The model service resources are created within the Kubernetes cluster with this command:

!kubectl apply -f ../kubernetes/model_service.yaml
configmap/model-service-configuration created
deployment.apps/credit-risk-model-deployment created
service/credit-risk-model-service created

Lets view the Deployment to see if it is available yet:

!kubectl get deployments
NAME                           READY   UP-TO-DATE   AVAILABLE   AGE
credit-risk-model-deployment   0/2     2            0           2s

Looks like the replicas are not ready yet. Let's wait a bit and try again:

!kubectl get deployments
NAME                           READY   UP-TO-DATE   AVAILABLE   AGE
credit-risk-model-deployment   2/2     2            2           23s

After the model service finished loading the model, it switched to the "READY" and "STARTED" state, which made the replicas available to serve traffic.

To get an idea of how the service went through the startup process, let's look a the service logs. Let's get the names of the pods that are running the service:

!kubectl get pods
NAME                                            READY   STATUS    RESTARTS   AGE
credit-risk-model-deployment-55654498f4-2bw9k   1/1     Running   0          29s
credit-risk-model-deployment-55654498f4-rxznw   1/1     Running   0          29s

Using one of the pod names, we'll get the logs from Kubernetes:

!kubectl logs credit-risk-model-deployment-55654498f4-2bw9k -c credit-risk-model
INFO:     Started server process [1]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on (Press CTRL+C to quit)
INFO: - "GET /api/health/startup HTTP/1.1" 503 Service Unavailable
INFO: - "GET /api/health/startup HTTP/1.1" 503 Service Unavailable
INFO: - "GET /api/health/startup HTTP/1.1" 200 OK
INFO: - "GET /api/health/ready HTTP/1.1" 200 OK
INFO: - "GET /api/health/ready HTTP/1.1" 200 OK
INFO: - "GET /api/health HTTP/1.1" 200 OK
INFO: - "GET /api/health/ready HTTP/1.1" 200 OK
INFO: - "GET /api/health HTTP/1.1" 200 OK
INFO: - "GET /api/health/ready HTTP/1.1" 200 OK
INFO: - "GET /api/health HTTP/1.1" 200 OK
INFO: - "GET /api/health/ready HTTP/1.1" 200 OK
INFO: - "GET /api/health HTTP/1.1" 200 OK

Looks like the process started up correctly and then the /api/health/startup endpoint was called three times, succeeding in the last request. Right after the startup check succeeded, the /api/health/ready endpoint was called, and it immediately succeeded. Right after that, the /api/health endpoint was called and it also succeeded. This startup process ensured that the model service was not accessed by clients before it finished loading the models into memory.

To access the model service, we created a Kubernetes service. The service details look like this:

!kubectl get services
NAME                        TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
credit-risk-model-service   NodePort   <none>        80:31004/TCP   68s

Minikube can expose the service on a local port, but we need to run a proxy process. The proxy is started like this:

minikube service credit-risk-model-service --url -n model-services

The command outputs this URL:

The command must keep running to keep the tunnel open to the running model service in the minikube cluster.

We can send a request to the model service through the local endpoint like this:

!curl -X 'POST' \
  '' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d "{ \
      \"annual_income\": 273000, \
      \"collections_in_last_12_months\": 20, \
      \"delinquencies_in_last_2_years\": 39, \
      \"debt_to_income_ratio\": 42.64, \
      \"employment_length\": \"< 1 year\", \
      \"home_ownership\": \"MORTGAGE\", \
      \"number_of_delinquent_accounts\": 6, \
      \"interest_rate\": 28.99, \
      \"last_payment_amount\": 36475.59, \
      \"loan_amount\": 35000,  \
      \"derogatory_public_record_count\": 86, \
      \"loan_purpose\": \"debt_consolidation\", \
      \"revolving_line_utilization_rate\": 892.3, \
      \"term\": \" 36 months\", \
      \"total_payments_to_date\": 57777.58, \
      \"verification_status\": \"Source Verified\" \

The health check endpoints of the model service are also available to clients:

!curl -X 'GET' \
  '' \
  -H 'accept: application/json'

Deleting the Resources

We're done working with the Kubernetes resources, so we will delete them and shut down the cluster.

To delete the model service pods, execute this command:

!kubectl delete -f ../kubernetes/model_service.yaml
configmap "model-service-configuration" deleted
deployment.apps "credit-risk-model-deployment" deleted
service "credit-risk-model-service" deleted

To delete the model-services namespace, delete this command:

!kubectl delete -f ../kubernetes/namespace.yaml
namespace "model-services" deleted
resourcequota "model-services-resource-quota" deleted

To shut down the Kubernetes cluster:

!minikube stop
✋  Stopping node "minikube"  ...
🛑  Powering off "minikube" via SSH ...
🛑  1 node stopped.


In this blog post we showed how to deal with a common issue that arises when a large model is deployed. When the model parameters take a long time to load, the model service needs to make sure that no clients are depending on it to provide predictions until it is finished starting up. We accomplished this on the Kubernetes platform by adding health check endpoints to the model service and configuring Kubernetes to check on the endpoints. By doing this we are able to guarantee that the model service will only become available to clients once it has finished starting up.

In order to build the health checks, the model did not need to change at all. We were able to build the logic into the model service package, which means that the model prediction logic did not have to change to deal with this requirement. We were able to isolate a deployment concern from the model that we were trying to deploy. This also means that we can reuse the model to make predictions in other contexts and not have this extra logic being carried along with it.