Appendix 2. (Advanced) Create a microservice with dependencies in Python

Creating a virtual environment

Similar to packrat in R, the package virtualenv in Python manages virtual environments.

  1. Install virtualenv if not yet
$ [sudo] pip install virtualenv
  1. Create a virtual environment in your local system

    1. Go to or create a directory where you will have all your files in
$ virtualenv ENV
  1. Active your virtual environment
$ source bin/activate
  1. Later, deactivate your virtual environment
$ deactivate
  1. For more details, please refer

Installing frequently used packages

  1. SCIKIT-LEARN is very frequently used in building models. You can install them using pip in your VIRTUALENV. If your models do not need SCIKIT-LEARN or any other particular package, you can skip this step. This step can take some minutes.
$ source bin/activate
$ pip install numpy
$ pip install scipy
$ pip install scikit-learn[alldeps]

Developing a model

  1. In this example, we will use the Boston housing data to predict property prices in Boston (
# *************************************
#              READ DATA
# *************************************
from sklearn import datasets
from sklearn.utils import shuffle
import numpy as np
boston = datasets.load_boston()
X, y = shuffle(,, random_state=13)
X = X.astype(np.float32)
# We already know good variables from an analysis not shown here: LSTAT, RM, DIS, AGE
# LSTAT: % lower status of the population
# RM: average number of rooms per dwelling
# DIS: weighted distances to five Boston employment centres
# AGE: proportion of owner-occupied units built prior to 1940
X = X[:, [12, 5, 7, 6]]

# *************************************
#           BUILD A ML MODEL
# *************************************
from sklearn import ensemble
clf = ensemble.GradientBoostingRegressor(), y)

Relative importance of the variables in the dataset is displayed below for your interest:


Saving the model

from sklearn.externals import joblib
joblib.dump(clf, 'boston_property_pricing_gbm.pkl')


  1. instructs how to load saved models, run them and return results. Only requirement is that this file must have a function named run with one input argument.
from sklearn.externals import joblib
import math
clf = joblib.load('boston_property_pricing_gbm.pkl')

def run(data):
    model_result = clf.predict([[data['LSTAT'], data['RM'], data['DIS'], data['AGE']]])
    return {'predicted_property_price': round(model_result[0], 2)}

Preparing requirements.txt

$ pip freeze > requirements.txt

It will look like below:


Later we need to upload (knowledge file), requirements.txt (requirement file) and boston_property_pricing_gbm.pkl (a miscellaneous file). Note that the script used to create the GBM model does not have to be uploaded.