Description
To run lapply in distributed mode via Zeppelin one needs to run SparkR:::lapply instead of spark.lapply.
As per spark documentation spark.lapply should work.
Steps to reproduce:
Build zeppelin using with r profiles enabled:
mvn clean install -DskipTests -Drat.skip=true -Pspark-2.0 -Phadoop-2.4 -Pyarn -Ppyspark -Psparkr -Pr -Pscala-2.11
Failed Paragraph
%spark.r
families <- c("gaussian", "poisson")
df <- createDataFrame(iris)
train <- function(family){
model <- model <- glm(Sepal.Length ~ Sepal.Width + Species, iris, family = family)
summary(model)
}
model.summaries <- spark.lapply(families, train)
Passin paragraph:
%spark.r
families <- c("gaussian", "poisson")
df <- createDataFrame(iris)
train <- function(family){
model <- model <- glm(Sepal.Length ~ Sepal.Width + Species, iris, family = family)
summary(model)
}
model.summaries <- spark.lapply(families, train)
Attachments
Issue Links
- links to