Creating an API for a H2O Random Forest Model

H2O is regarded as one of the best machine learning platforms out there with its many advanced algorithms, interface to R, Python and Scala, ability to work with BigData frameworks and scalable in-memory architecture. In this blog post, we will see how to create a microservice that makes predictions for given input using a H2O Random forest model.

First, let us create a H2O model. Creating a model is not the most important focus of this blog post. We will borrow an example from a tutorial H2O provides. The tutorial demonstrates how to create a random forest model that predicts a forest cover type from cartographic variables (for more details, please find these Kaggle and UIC pages). An abbreviated version of the code is presented below (it is slightly modified to prepare the materials to upload to Knowru).

# Copy from: https://github.com/h2oai/h2o-tutorials/blob/master/tutorials/gbm-randomforest/GBM_RandomForest_Example.R
# To later get a list of dependencies
library(packrat)
packrat::init()

# Initialize H2O environment
library(h2o)
h2o.init()

# Import data
df <- h2o.importFile(path = normalizePath("covtype.full.csv"))  # You can download this data file from here: https://github.com/h2oai/h2o-tutorials/blob/master/tutorials/data/covtype.full.csv

# Split data
splits <- h2o.splitFrame(df, c(0.6,0.2), seed=1234)
train <- h2o.assign(splits[[1]], "train.hex")   
valid <- h2o.assign(splits[[2]], "valid.hex")
test <- h2o.assign(splits[[3]], "test.hex")

# Train model
rf2 <- h2o.randomForest(
  training_frame = train,
  validation_frame = valid,
  x = 1:12,
  y = 13,
  model_id = "rf_covType2",
  ntrees = 20,
  max_depth = 3,
  stopping_rounds = 2,
  stopping_tolerance = 1e-2,
  score_each_iteration = T,
  seed=3000000)
summary(rf2)

# See how the output looks like
finalRf_predictions <- h2o.predict(object = rf2,newdata = test[1,])
finalRf_predictions
##   predict   class_1   class_2      class_3      class_4     class_5     class_6     class_7
## 1 class_1 0.5652644 0.1720687 0.0027895756 0.0002420710 0.001761907 0.001831265 0.256042076
## 2 class_1 0.5745533 0.4139127 0.0007668296 0.0002651079 0.003204278 0.001022852 0.006274899
## 3 class_1 0.5497612 0.4337398 0.0040469172 0.0002789891 0.003372056 0.001076409 0.007724670

library(jsonlite)
toJSON(as.data.frame(test[1,]))
## [{"Elevation":3191,"Aspect":45,"Slope":19,"Horizontal_Distance_To_Hydrology":323,"Vertical_Distance_To_Hydrology":88,"Horizontal_Distance_To_Roadways":3932,"Hillshade_9am":221,"Hillshade_Noon":195,"Hillshade_3pm":100,"Horizontal_Distance_To_Fire_Points":2919,"Wilderness_Area":"area_0","Soil_Type":"type_39","Cover_Type":"class_2"}] 

toJSON(as.data.frame(finalRf_predictions))
## [{"predict":"class_1","class_1":0.5653,"class_2":0.1721,"class_3":0.0028,"class_4":0.0002,"class_5":0.0018,"class_6":0.0018,"class_7":0.256}] 

# Save the model to a file
h2o.saveModel(object=rf2, path=getwd(), force=TRUE)

# Get a list of dependencies
pakcrat::snapshot()

Now we need to write knowledge.R which will tell Knowru how to take input values from requests, process them using machine learning models (though we do not have to use machine learning models) and return values for responses. It is as simple as a few lines.

# Initialize H2O environment
library(h2o)
h2o.init()

# Load the saved model
rf2 <- h2o.loadModel('rf_covType2')

# Define a run function that will get called when a request comes in. The input argument will have the POST data.
run <- function(input) {
  h2o_input <- as.h2o(input)
  result<-h2o.predict(rf2, newdata=h2o_input)
  return(as.list(as.data.frame(result)))
}

Before uploading to Knowru, we can locally check whether it will work without an error. If we provide a R list representation of our input value, it should return a R list representation of output:

# Execute the run function. Then,
> exampleInput <- list(Elevation=3191,Aspect=45,Slope=19,Horizontal_Distance_To_Hydrology=323,Vertical_Distance_To_Hydrology=88,Horizontal_Distance_To_Roadways=3932,Hillshade_9am=221,Hillshade_Noon=195,Hillshade_3pm=100,Horizontal_Distance_To_Fire_Points=2919,Wilderness_Area="area_0",Soil_Type="type_39",Cover_Type="class_2")
> run(exampleInput)
##   |============================================================================================================| 100%
##   |============================================================================================================| 100%
## $predict
## [1] class_1
## Levels: class_1
## 
## $class_1
## [1] 0.5652644
## 
## $class_2
## [1] 0.1720687
## 
## $class_3
## [1] 0.002789576
## 
## $class_4
## [1] 0.000242071
## 
## $class_5
## [1] 0.001761907
## 
## $class_6
## [1] 0.001831265
## 
## $class_7
## [1] 0.2560421

Now that we check it will run without an error, let us create an API using Knowru. Log in using our demo account(ID: demo, password: KnowruForAPI!). Go to the Runnable page and click the Create button.

Give your API a title, choose R 3.3 for the language and note a message to this initial version of your API.

For the Files, we need to upload the following three files:

  • knowledge.R: this is the instruction of how the API should run to Knowru; we wrote it above
  • packrat.lock: this file is automatically created by packrat and it is saved under your working directory's packrat directory. Using this file the platform will install dependencies for your API
  • rf_covType2: this file is from the machine learning file we saved; knowledge.R needs this file

Note that you do not need to upload the file you wrote to create the random forest model nor the data file from which you created the model. You can download a zip of the three files here.

Then click the Create button. First the version’s status will be building, meaning the platform is building a docker image behind the scene. After some seconds, it will change to docking, meaning the platform is docking the image to EC2 instances. This process can take up to a couple of minutes.

Once the status changed to ready click the title of the API (which we call a runnable) to see more detailed information. At the bottom, click the Run button and enter the following information to see if it works properly.

{"Elevation":3191,"Aspect":45,"Slope":19,"Horizontal_Distance_To_Hydrology":323,"Vertical_Distance_To_Hydrology":88,"Horizontal_Distance_To_Roadways":3932,"Hillshade_9am":221,"Hillshade_Noon":195,"Hillshade_3pm":100,"Horizontal_Distance_To_Fire_Points":2919,"Wilderness_Area":"area_0","Soil_Type":"type_39","Cover_Type":"class_2"}

The first couple of requests will take 4-5 seconds compared to subsequent requests which will take only a half second. It is because during the first requests, the H2O environment is initialized in the background Docker containers.

Once you make a request, you will see the following return value.

You can make a request from outside of the platform. For example, you can curl:

$ curl -X POST -d '{"input": {"Elevation":3191,"Aspect":45,"Slope":19,"Horizontal_Distance_To_Hydrology":323,"Vertical_Distance_To_Hydrology":88,"Horizontal_Distance_To_Roadways":3932,"Hillshade_9am":221,"Hillshade_Noon":195,"Hillshade_3pm":100,"Horizontal_Distance_To_Fire_Points":2919,"Wilderness_Area":"area_0","Soil_Type":"type_39","Cover_Type":"class_2"}}' -H "Content-Type: application/json" -H "Accept: application/json" -H "Authorization: Token d84eea5611ab8221f608ac0afd498513e61c6bef" https://www.knowru.com/api/runnable/r-h2o/run/
##  {
##     "created_at":"2018-01-04T23:53:43.473463-06:00"
##     ,"created_by":"demo"
##     ,"updated_at":"2018-01-04T23:53:43.473874-06:00"
##     ,"updated_by":"demo"
##     ,"runnable":"https://www.knowru.com:443/api/runnable/r-h2o/"
##     ,"runnable_version":"https://www.knowru.com/api/runnable/r-h2o/version/seventh/"
##     ,"name":"1tctp1dy0nk75p6a4sha"
##     ,"requested_at":"2018-01-04T23:53:42.720039-06:00"
##     ,"responded_at":"2018-01-04T23:53:43.473292-06:00"
##     ,"response_time":753.2529999999999
##     ,"input":{"Slope":19,"Elevation":3191,"Cover_Type":"class_2","Hillshade_3pm":100,"Wilderness_Area":"area_0","Soil_Type":"type_39","Vertical_Distance_To_Hydrology":88,"Hillshade_9am":221,"Horizontal_Distance_To_Roadways":3932,"Horizontal_Distance_To_Hydrology":323,"Horizontal_Distance_To_Fire_Points":2919,"Aspect":45,"Hillshade_Noon":195}
##     ,"output":{"predict":"class_1","class_2":0.1721,"class_3":0.0028,"class_1":0.5653,"class_6":0.0018,"class_7":0.256,"class_4":0.0002,"class_5":0.0018}
##     ,"error_type":null
##     ,"error_message":null
##     ,"standard_out_message":"[u'\\r  |                                                                            \\r  |                                                                      |   0%\\r  |                                                                            \\r  |======================================================================| 100%', u'\\r  |                                                                            \\r  |                                                                      |   0%\\r  |                                                                            \\r  |======================================================================| 100%']"
##     ,"standard_error_message":null
##     ,"status":"completed"
##     ,"url":"https://www.knowru.com/api/runnable/r-h2o/version/seventh/run/1tctp1dy0nk75p6a4sha/"
## }

Or can use Postman.

Congrats! Now your H2O model API is set up and running. If you don’t want to bother yourself following these steps and just want to see how it works, please go to this pre-created API page to try yourself (ID: demo, PW: KnowruForAPI!).

In our next blog post, we will talk about other features of Knowru that you will like as you will manage your APIs such as analytics, alarming, access control management, API documentation etc.

Happy APIing!

Receive notices on new features, blog posts and so many more!