The transform method of predictors in spark.ml has a relatively high latency when predicting single instances or small batches, which is mainly due to the overhead introduced by DataFrame operations. For a text classification task on the RCV1 datatset, changing the access level of the low-level "predict" method from protected to public and using it to make predictions reduced the latency of single predictions by three to four folds and that of batches by 50%. While the transform method is flexible and sufficient for general usage, exposing the low-level predict method to the public API can benefit many applications that require low-latency response.
I performed an experiment to measure the latency of single instance predictions in Spark and some other popular ML toolkits. Specifically, I'm looking at the the time it takes to predict or classify a feature vector residing in memory after the model is trained.
For each toolkit in the table below, logistic regression was trained on the Reuters RCV1 dataset which contains 697,641 documents and 47,236 features stored in LIBSVM format along with binary labels. Then the wall-clock time required to classify each document in a sample of 100,000 documents is measured, and the 50th, 90th, and 99th percentiles and the maximum time are reported.
All toolkits were tested on a desktop machine with an i7-6700 processor and 16 GB memory, running Ubuntu 14.04 and OpenBLAS. The wall clock resolution is 80ns for Python and 20ns for Scala.
The table below shows the latency of predictions for single instances in milliseconds, sorted by P90. Spark and Spark 2 refer to versions 1.6.1 and 2.0.0-SNAPSHOT (on master), respectively. In Spark (Modified) and Spark 2 (Modified), I changed the access level of the predict method from protected to public and used it to perform the predictions instead of transform.
|Spark 2 (Modified)||ML (Scala)||0.0004||0.0031||0.0087||0.3979|
|Spark (Modified)||ML (Scala)||0.0013||0.0061||0.0632||0.4924|
|Spark 2||ML (Scala)||8.4603||9.4352||13.2143||292.8733|
The results show that spark.mllib has the lowest latency among all other toolkits and APIs, and this can be attributed to its low-level prediction function that operates directly on the feature vector. However, spark.ml has a relatively high latency which is in the order of 3ms for Spark 1.6.1 and 10ms for Spark 2.0.0. Profiling the transform method of logistic regression in spark.ml showed that only 0.01% of the time is being spent in doing the dot product and logit transformation, while the rest of the time is dominated by the DataFrame operations (mostly the “withColumn” operation that appends the predictions column(s) to the input DataFrame). The results of the modified versions of spark.ml, which directly use the predict method, validate this observation as the latency is reduced by three to four folds.
Since Spark splits batch predictions into a series of single-instance predictions, reducing the latency of single predictions can lead to lower latencies in batch predictions. I tried batch predictions in spark.ml (1.6.1) using testing_features.map(x => model.predict( x)).collect() instead of model.transform(testing_dataframe).select(“prediction”).collect(), and the former had roughly 50% less latency for batches of size 1000, 10,000, and 100,000.
Although the experiment is constrained to logistic regression, other predictors in the classification, regression, and clustering modules can suffer from the same problem as it is being caused by the overhead due to DataFrames and not by the model itself. Therefore, changing the access level of the predict method in all predictors to public, can benefit applications requiring low-latency and add more flexibility to programmers.
SPARK-10884 Support prediction on single instance for regression and classification related models
- links to