Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-1850

Bad exception if multiple jars exist when running PySpark

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.0.0
    • 1.0.1
    • PySpark
    • None

    Description

      Found multiple Spark assembly jars in /Users/andrew/Documents/dev/andrew-spark/assembly/target/scala-2.10:
      Traceback (most recent call last):
        File "/Users/andrew/Documents/dev/andrew-spark/python/pyspark/shell.py", line 43, in <module>
          sc = SparkContext(os.environ.get("MASTER", "local[*]"), "PySparkShell", pyFiles=add_files)
        File "/Users/andrew/Documents/dev/andrew-spark/python/pyspark/context.py", line 94, in __init__
          SparkContext._ensure_initialized(self, gateway=gateway)
        File "/Users/andrew/Documents/dev/andrew-spark/python/pyspark/context.py", line 180, in _ensure_initialized
          SparkContext._gateway = gateway or launch_gateway()
        File "/Users/andrew/Documents/dev/andrew-spark/python/pyspark/java_gateway.py", line 49, in launch_gateway
          gateway_port = int(proc.stdout.readline())
      ValueError: invalid literal for int() with base 10: 'spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4-deps.jar\n'
      

      It's trying to read the Java gateway port as an int from the sub-process' STDOUT. However, what it read was an error message, which is clearly not an int. We should differentiate between these cases and just propagate the original message if it's not an int. Right now, this exception is not very helpful.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              andrewor14 Andrew Or
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: