R package for converting R models to PMML
This package complements the standard [pmml
package] (http://cran.r-project.org/web/packages/pmml/):
- It supports several model types (eg.
gbm
,iForest
,ranger
,xgb.Booster
) that are not supported by the standardpmml
package. - It is extremely fast and memory efficient. For example, it can convert a typical
randomForest
model to a PMML file in a few seconds time, whereas the standardpmml
package requires several hours to do the same.
- Java 1.7 or newer. The Java executable must be available on system path.
Installing the package from its GitHub repository using the [devtools
package] (http://cran.r-project.org/web/packages/devtools/):
library("devtools")
install_github(repo = "jpmml/r2pmml")
Loading the package:
library("r2pmml")
Training and exporting a simple randomForest
model:
library("randomForest")
library("r2pmml")
data(iris)
# Train a model using raw Iris data
iris.rf = randomForest(Species ~ ., data = iris, ntree = 7)
print(iris.rf)
# Export the model to PMML
r2pmml(iris.rf, "iris_rf.pmml")
The r2pmml
function takes an optional argument preProcess
, which associates the model with data pre-processing transformations.
Training and exporting a more sophisticated randomForest
model:
library("caret")
library("randomForest")
library("r2pmml")
data(iris)
# Create a preprocessor
iris.preProcess = preProcess(iris, method = c("range"))
# Use the preprocessor to transform raw Iris data to pre-processed Iris data
iris.transformed = predict(iris.preProcess, newdata = iris)
# Train a model using pre-processed Iris data
iris.rf = randomForest(Species ~., data = iris.transformed, ntree = 7)
print(iris.rf)
# Export the model to PMML.
# Pass the preprocessor as the `preProcess` argument
r2pmml(iris.rf, preProcess = iris.preProcess, "iris_rf.pmml")
Alternatively, it is possible to associate lm
and glm
models with data pre-processing transformations via [model formulae] (https://stat.ethz.ch/R-manual/R-devel/library/stats/html/formula.html).
Supported model formula features:
- Interaction terms.
I(..)
expression terms:- The
if
expression. - Logical operators
&
,|
and!
. - Relational operators
==
,!=
,<
,<=
,>=
and>
. - Arithmetic operators
+
,-
,/
and*
. - Exponentiation operators
^
and**
. - The
is.na
function. - Arithmetic functions
abs
,ceiling
,exp
,floor
,log
,log10
,round
andsqrt
.
- The
cut()
function terms.
Training and exporting a glm
model:
library("r2pmml")
# Load and prepare the Auto-MPG dataset
auto = read.table("http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data", quote = "\"", header = FALSE, na.strings = "?", row.names = NULL, col.names = c("mpg", "cylinders", "displacement", "horsepower", "weight", "acceleration", "model_year", "origin", "car_name"))
auto$origin = as.factor(auto$origin)
auto$car_name = NULL
auto = na.omit(auto)
# Train a model
auto.glm = glm(mpg ~ (. - horsepower - weight) ^ 2 + I(displacement / cylinders) + cut(horsepower, breaks = c(0, 50, 100, 150, 200, 250)) + I(log(weight)), data = auto)
# Export the model to PMML
r2pmml(auto.glm, "auto_glm.pmml")
Training and exporting a ranger
model:
library("ranger")
library("r2pmml")
data(iris)
# Train a model.
# Keep the forest data structure by specifying `write.forest = TRUE`
iris.ranger = ranger(Species ~ ., data = iris, num.trees = 7, write.forest = TRUE)
print(iris.ranger)
# Export the model to PMML.
# Pass the levels of all factor variables as the `variable.levels` argument
r2pmml(iris.ranger, variable.levels = sapply(iris, levels), "iris_ranger.pmml")
Training and exporting an xgb.Booster
model:
library("xgboost")
library("r2pmml")
data(iris)
iris_x = iris[, 1:4]
iris_y = as.integer(iris[, 5]) - 1
# Train a model
iris.xgb = xgboost(data = as.matrix(iris_x), label = iris_y, missing = NA, objective = "multi:softmax", num_class = 3, nrounds = 13)
# Create a feature map
iris.fmap = data.frame(
"id" = seq(from = 0, (to = ncol(iris_x) - 1)),
"name" = names(iris_x),
"type" = rep("q", ncol(iris_x))
)
# Export the model to PMML.
# Pass the feature map as the `fmap` argument.
# Pass the name and category levels of the target field as `response_name` and `response_levels` arguments, respectively.
# Pass the value of missing value as the `missing` argument
r2pmml(iris.xgb, fmap = iris.fmap, response_name = "Species", response_levels = c("setosa", "versicolor", "virginica"), missing = NA, "iris_xgb.pmml")
Tweaking JVM configuration:
Sys.setenv(JAVA_TOOL_OPTIONS = "-Xms4G -Xmx8G")
r2pmml(iris.rf, "iris_rf.pmml")
Employing a custom converter class:
r2pmml(iris.rf, "iris_rf.pmml", converter = "com.mycompany.MyRandomForestConverter", converter_classpath = "/path/to/myconverter-1.0-SNAPSHOT.jar")
Removing the package:
remove.packages("r2pmml")
R2PMML is licensed under the [GNU Affero General Public License (AGPL) version 3.0] (http://www.gnu.org/licenses/agpl-3.0.html). Other licenses are available on request.
Please contact [info@openscoring.io] (mailto:info@openscoring.io)