Description
SparkR function spark.getSparkFiles fails when it was called on executors. For examples, the following R code will fail. (See error logs in attachment.)
spark.addFile("./README.md") seq <- seq(from = 1, to = 10, length.out = 5) train <- function(seq) { path <- spark.getSparkFiles("README.md") print(path) } spark.lapply(seq, train)
However, we can run successfully with Scala API:
import org.apache.spark.SparkFiles sc.addFile("./README.md”) sc.parallelize(Seq(0)).map{ _ => SparkFiles.get("README.md")}.first()
and also successfully with Python API:
from pyspark import SparkFiles sc.addFile("./README.md") sc.parallelize(range(1)).map(lambda x: SparkFiles.get("README.md")).first()