Appendix 2. (Advanced) Create a microservice with dependencies in Python

Creating a virtual environment

Similar to packrat in R, the package virtualenv in Python manages virtual environments.

  1. Install virtualenv if not yet
$ [sudo] pip install virtualenv
  1. Create a virtual environment in your local system

    1. Go to or create a directory where you will have all your files in
$ virtualenv ENV
  1. Active your virtual environment
$ source bin/activate
  1. Later, deactivate your virtual environment
$ deactivate
  1. For more details, please refer https://virtualenv.pypa.io/en/stable/userguide/

Installing frequently used packages

  1. SCIKIT-LEARN is very frequently used in building models. You can install them using pip in your VIRTUALENV. If your models do not need SCIKIT-LEARN or any other particular package, you can skip this step. This step can take some minutes.
$ source bin/activate
$ pip install numpy
$ pip install scipy
$ pip install scikit-learn[alldeps]

Developing a model

  1. In this example, we will use the Boston housing data to predict property prices in Boston (http://scikit-learn.org/stable/auto_examples/ensemble/plot_gradient_boosting_regression.html)
# *************************************
#              READ DATA
# *************************************
from sklearn import datasets
from sklearn.utils import shuffle
import numpy as np
boston = datasets.load_boston()
X, y = shuffle(boston.data, boston.target, random_state=13)
X = X.astype(np.float32)
# We already know good variables from an analysis not shown here: LSTAT, RM, DIS, AGE
# LSTAT: % lower status of the population
# RM: average number of rooms per dwelling
# DIS: weighted distances to five Boston employment centres
# AGE: proportion of owner-occupied units built prior to 1940
X = X[:, [12, 5, 7, 6]]

# *************************************
#           BUILD A ML MODEL
# *************************************
from sklearn import ensemble
clf = ensemble.GradientBoostingRegressor()
clf.fit(X, y)

Relative importance of the variables in the dataset is displayed below for your interest:

_images/appendix_python_1.png

Saving the model

from sklearn.externals import joblib
joblib.dump(clf, 'boston_property_pricing_gbm.pkl')

Creating knowledge.py

  1. knowledge.py instructs how to load saved models, run them and return results. Only requirement is that this file must have a function named run with one input argument.
from sklearn.externals import joblib
import math
clf = joblib.load('boston_property_pricing_gbm.pkl')

def run(data):
    model_result = clf.predict([[data['LSTAT'], data['RM'], data['DIS'], data['AGE']]])
    return {'predicted_property_price': round(model_result[0], 2)}

Preparing requirements.txt

$ pip freeze > requirements.txt

It will look like below:

argparse==1.2.1
numpy==1.11.2
scipy==0.18.1
scikit-learn==0.18.1
wsgiref==0.1.2

Later we need to upload knowledge.py (knowledge file), requirements.txt (requirement file) and boston_property_pricing_gbm.pkl (a miscellaneous file). Note that the script used to create the GBM model does not have to be uploaded.