[SPARK-3772] RDD operation on IPython REPL failed with an illegal port number - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.2.0
Fix Version/s: 1.2.0
Component/s: PySpark
Labels:
- pyspark
Environment:

Mac OS X 10.9.5, Python 2.7.8, IPython 2.2.0

Target Version/s:

1.2.0

Description

To reproduce this issue, we should execute following commands on the commit: 6e27cb630de69fa5acb510b4e2f6b980742b1957.

$ PYSPARK_PYTHON=ipython ./bin/pyspark
...
In [1]: file = sc.textFile('README.md')
In [2]: file.first()
...
14/10/03 08:50:13 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/10/03 08:50:13 WARN LoadSnappy: Snappy native library not loaded
14/10/03 08:50:13 INFO FileInputFormat: Total input paths to process : 1
14/10/03 08:50:13 INFO SparkContext: Starting job: runJob at PythonRDD.scala:334
14/10/03 08:50:13 INFO DAGScheduler: Got job 0 (runJob at PythonRDD.scala:334) with 1 output partitions (allowLocal=true)
14/10/03 08:50:13 INFO DAGScheduler: Final stage: Stage 0(runJob at PythonRDD.scala:334)
14/10/03 08:50:13 INFO DAGScheduler: Parents of final stage: List()
14/10/03 08:50:13 INFO DAGScheduler: Missing parents: List()
14/10/03 08:50:13 INFO DAGScheduler: Submitting Stage 0 (PythonRDD[2] at RDD at PythonRDD.scala:44), which has no missing parents
14/10/03 08:50:13 INFO MemoryStore: ensureFreeSpace(4456) called with curMem=57388, maxMem=278019440
14/10/03 08:50:13 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.4 KB, free 265.1 MB)
14/10/03 08:50:13 INFO DAGScheduler: Submitting 1 missing tasks from Stage 0 (PythonRDD[2] at RDD at PythonRDD.scala:44)
14/10/03 08:50:13 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
14/10/03 08:50:13 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1207 bytes)
14/10/03 08:50:13 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
14/10/03 08:50:14 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.IllegalArgumentException: port out of range:1027423549
at java.net.InetSocketAddress.checkPort(InetSocketAddress.java:143)
at java.net.InetSocketAddress.<init>(InetSocketAddress.java:188)
at java.net.Socket.<init>(Socket.java:244)
at org.apache.spark.api.python.PythonWorkerFactory.createSocket$1(PythonWorkerFactory.scala:75)
at org.apache.spark.api.python.PythonWorkerFactory.liftedTree1$1(PythonWorkerFactory.scala:90)
at org.apache.spark.api.python.PythonWorkerFactory.createThroughDaemon(PythonWorkerFactory.scala:89)
at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:62)
at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:100)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:71)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:182)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:744)

Attachments

Issue Links

links to

[Github] Pull Request #2651 (JoshRosen)

Activity

People

Assignee:: Unassigned

Reporter:: Tomohiko K.

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 03/Oct/14 00:03

Updated:: 09/Oct/14 23:08

Resolved:: 09/Oct/14 23:08