Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-1284

pyspark hangs after IOError on Executor

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.1.0
    • PySpark
    • None

    Description

      When running a reduceByKey over a cached RDD, Python fails with an exception, but the failure is not detected by the task runner. Spark and the pyspark shell hang waiting for the task to finish.

      The error is:

      PySpark worker failed with exception:
      Traceback (most recent call last):
        File "/home/hadoop/spark/python/pyspark/worker.py", line 77, in main
          serializer.dump_stream(func(split_index, iterator), outfile)
        File "/home/hadoop/spark/python/pyspark/serializers.py", line 182, in dump_stream
          self.serializer.dump_stream(self._batched(iterator), stream)
        File "/home/hadoop/spark/python/pyspark/serializers.py", line 118, in dump_stream
          self._write_with_length(obj, stream)
        File "/home/hadoop/spark/python/pyspark/serializers.py", line 130, in _write_with_length
          stream.write(serialized)
      IOError: [Errno 104] Connection reset by peer
      
      14/03/19 22:48:15 INFO scheduler.TaskSetManager: Serialized task 4.0:0 as 4257 bytes in 47 ms
      Traceback (most recent call last):
        File "/home/hadoop/spark/python/pyspark/daemon.py", line 117, in launch_worker
          worker(listen_sock)
        File "/home/hadoop/spark/python/pyspark/daemon.py", line 107, in worker
          outfile.flush()
      IOError: [Errno 32] Broken pipe
      

      I can reproduce the error by running take(10) on the cached RDD before running reduceByKey (which looks at the whole input file).

      Affects Version 1.0.0-SNAPSHOT (4d88030486)

      Attachments

        Activity

          People

            davies Davies Liu
            jblomo Jim Blomo
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: