Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-22028

spark-submit trips over environment variables

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • 2.1.1
    • None
    • Deploy
    • Operating System: Windows 10
      Shell: CMD or bash.exe, both with the same result

    Description

      I have a strange environment variable in my Windows operating system:

      C:\Path>set ""
      =::=::\
      

      According to this issue at stackexchange, this is some sort of old MS-DOS relict that interacts with cygwin shells.

      Leaving that aside for a moment, Spark tries to read environment variables on submit and trips over it:

      ./spark-submit.cmd
      Running Spark using the REST application submission protocol.
      Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
      17/09/15 15:57:51 INFO RestSubmissionClient: Submitting a request to launch an application in spark://********:31824.
      17/09/15 15:58:01 WARN RestSubmissionClient: Unable to connect to server spark://*******:31824.
      Warning: Master endpoint spark://********:31824 was not a REST server. Falling back to legacy submission gateway instead.
      17/09/15 15:58:02 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path
      [ ... ]
      17/09/15 15:58:02 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
      17/09/15 15:58:08 ERROR ClientEndpoint: Exception from cluster was: java.lang.IllegalArgumentException: Invalid environment variable name: "=::"
      java.lang.IllegalArgumentException: Invalid environment variable name: "=::"
              at java.lang.ProcessEnvironment.validateVariable(ProcessEnvironment.java:114)
              at java.lang.ProcessEnvironment.access$200(ProcessEnvironment.java:61)
              at java.lang.ProcessEnvironment$Variable.valueOf(ProcessEnvironment.java:170)
              at java.lang.ProcessEnvironment$StringEnvironment.put(ProcessEnvironment.java:242)
              at java.lang.ProcessEnvironment$StringEnvironment.put(ProcessEnvironment.java:221)
              at org.apache.spark.deploy.worker.CommandUtils$$anonfun$buildProcessBuilder$2.apply(CommandUtils.scala:55)
              at org.apache.spark.deploy.worker.CommandUtils$$anonfun$buildProcessBuilder$2.apply(CommandUtils.scala:54)
              at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
              at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:221)
              at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:428)
              at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:428)
              at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
              at org.apache.spark.deploy.worker.CommandUtils$.buildProcessBuilder(CommandUtils.scala:54)
              at org.apache.spark.deploy.worker.DriverRunner.prepareAndRunDriver(DriverRunner.scala:181)
              at org.apache.spark.deploy.worker.DriverRunner$$anon$1.run(DriverRunner.scala:91)
      

      Please note that spark-submit.cmd is in this case my own script calling the spark-submit.cmd from the spark distribution.

      I think that shouldn't happen. Spark should handle such a malformed environment variable gracefully.

      Attachments

        Activity

          People

            Unassigned Unassigned
            chevron Franz Wimmer
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: