Details
Description
When JVM debugging options are in conf/java-opts, it causes pyspark to fail when creating the SparkContext. The java-opts file looks like the following:
-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005
Here's the error:
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) /Library/Python/2.7/site-packages/IPython/utils/py3compat.pyc in execfile(fname, *where) 202 else: 203 filename = fname --> 204 __builtin__.execfile(filename, *where) /Users/pat/Projects/spark/python/pyspark/shell.py in <module>() 41 SparkContext.setSystemProperty("spark.executor.uri", os.environ["SPARK_EXECUTOR_URI"]) 42 ---> 43 sc = SparkContext(os.environ.get("MASTER", "local[*]"), "PySparkShell", pyFiles=add_files) 44 45 print("""Welcome to /Users/pat/Projects/spark/python/pyspark/context.pyc in __init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway) 92 tempNamedTuple = namedtuple("Callsite", "function file linenum") 93 self._callsite = tempNamedTuple(function=None, file=None, linenum=None) ---> 94 SparkContext._ensure_initialized(self, gateway=gateway) 95 96 self.environment = environment or {} /Users/pat/Projects/spark/python/pyspark/context.pyc in _ensure_initialized(cls, instance, gateway) 172 with SparkContext._lock: 173 if not SparkContext._gateway: --> 174 SparkContext._gateway = gateway or launch_gateway() 175 SparkContext._jvm = SparkContext._gateway.jvm 176 SparkContext._writeToFile = SparkContext._jvm.PythonRDD.writeToFile /Users/pat/Projects/spark/python/pyspark/java_gateway.pyc in launch_gateway() 44 proc = Popen(command, stdout=PIPE, stdin=PIPE) 45 # Determine which ephemeral port the server started on: ---> 46 port = int(proc.stdout.readline()) 47 # Create a thread to echo output from the GatewayServer, which is required 48 # for Java log output to show up: ValueError: invalid literal for int() with base 10: 'Listening for transport dt_socket at address: 5005\n'
Note that when you use JVM debugging, the very first line of output (e.g. when running spark-shell) looks like this:
Listening for transport dt_socket at address: 5005
Attachments
Issue Links
- depends upon
-
SPARK-2313 PySpark should accept port via a command line argument rather than STDIN
- Resolved
FYI ahirreddy matei, here's the pyspark issue I was talking to you guys about