Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Incomplete
-
2.3.2
-
None
Description
Currently if you run sparkR and accidentally specify an argument, it fails with a useless error message. For example:
$SPARK_HOME/bin/sparkR --master yarn --deploy-mode client fooarg
This gets turned into:
Launching java with spark-submit command spark-submit "-master" "yarn" "-deploy-mode" "client" "sparkr-shell" "fooarg" /tmp/Rtmp6XBGz2/backend_port162806ea36bca
Notice that "fooarg" got put before /tmp file which is how R and jvm know which port to connect to. SparkR eventually fails with timeout exception after 10 seconds.
SparkR should either not allow args or make sure the order is correct so the backend_port is always first. see https://github.com/apache/spark/blob/master/R/pkg/R/sparkR.R#L129