Appendix 1. (Advanced) Preparing microservices with dependencies in R

Creating a virtual environment

If your R microservice has dependencies, you must let Knowru about them by providing a list of packages and their versions. Knowru assumes users used packrat to create a virtual environment and maintain a list of dependencies. This user guide will walk you through an example of preparing files for Knowru beginning from setting up a virtual environment using packrat.

  1. Initialize a packrat environment

    1. Open your favorite R editor like Rstudio
    2. Save a new file into a directory of your choice
    3. Set the working directory to the source file location by clicking: [Session] – [Set Working Directory] – [To Source File]. Or type in your R terminal like below:
    > setwd("~/your/directory")
    
    1. Install packrat
    > install.packages("packrat")
    
    1. Initialize your packrat environment
    > library(packrat)
    
    > packrat::init()
    

Developing a model

In this example, we use the German credit data to build a model predicting default likelihood when a customer borrows a loan.

# *************************************
#              READ DATA
# *************************************
url <- "https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/german.data"
col.names <- c(
'Status of existing checking account', 'Duration in month', 'Credit history'
, 'Purpose', 'Credit amount', 'Savings account/bonds'
, 'Employment years', 'Installment rate in percentage of disposable income'
, 'Personal status and sex', 'Other debtors / guarantors', 'Present residence since'
, 'Property', 'Age in years', 'Other installment plans', 'Housing', 'Number of existing credits at this bank'
, 'Job', 'Number of people being liable to provide maintenance for', 'Telephone', 'Foreign worker', 'Status'
)
data <- read.csv(url, header=FALSE, sep=' ', col.names=col.names)

# *************************************
#         BUILD A ML MODEL
# *************************************
library(rpart)
german.credit.decision.tree <- rpart(
Status ~ Status.of.existing.checking.account + Duration.in.month + Credit.history + Savings.account.bonds
, method="class"
, data=data
)

This is how the credit model looks like:

_images/append_R_1.png

Saving the model

# *************************************
#         SAVE THE ML MODEL
# *************************************
save(german.credit.decision.tree, file='GermanCreditDecisionTree.RData')

Creating a knowledge.R file

knowledge.R instructs how to load saved models, run them and return results. Only requirement is that this file must have a function named run which takes one input argument (the argument will contain data from POST requests).

run <- function(data) {
load("GermanCreditDecisionTree.RData")

model.result <- predict(german.credit.decision.tree, data)

return(
    list(
    predicted.default.probability=round(model.result[1,2], digits=4)
    )
)
}

Preparing packrat.lock

In your R terminal, create a PACKRAT snapshot by:

> packrat::snapshot()

This will automatically create a packrat.lock file in the packrat directory. The file will look like below:

PackratFormat: 1.4
PackratVersion: 0.4.8.1
RVersion: 3.3.2
Repos: CRAN=https://cran.rstudio.com/,
    CRANextra=http://www.stats.ox.ac.uk/pub/RWin

Package: packrat
Source: CRAN
Version: 0.4.8-1
Hash: 6ad605ba7b4b476d84be6632393f5765

Now we prepared all files. We will later upload knowledge.R (knowledge file), packrat.lock (requirement file) and GermanCreditDecisionTree.RData file (miscellaneous). Note that you do not have to upload the script to create the decision tree model.