Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-18281

toLocalIterator yields time out error on pyspark2

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.0.1
    • 2.0.3, 2.1.1
    • PySpark
    • None
    • Ubuntu 14.04.5 LTS
      Driver: AWS M4.XLARGE
      Slaves: AWS M4.4.XLARGE
      mesos 1.0.1
      spark 2.0.1
      pyspark

    Description

      I run the example straight out of the api docs for toLocalIterator and it gives a time out exception:

      from pyspark import SparkContext
      sc = SparkContext()
      rdd = sc.parallelize(range(10))
      [x for x in rdd.toLocalIterator()]
      

      conf file:
      spark.driver.maxResultSize 6G
      spark.executor.extraJavaOptions -XX:+UseG1GC -XX:MaxPermSize=1G -XX:+HeapDumpOnOutOfMemoryError
      spark.executor.memory 16G
      spark.executor.uri foo/spark-2.0.1-bin-hadoop2.7.tgz
      spark.hadoop.fs.s3a.impl org.apache.hadoop.fs.s3a.S3AFileSystem
      spark.hadoop.fs.s3a.buffer.dir /raid0/spark
      spark.hadoop.fs.s3n.buffer.dir /raid0/spark
      spark.hadoop.fs.s3a.connection.timeout 500000
      spark.hadoop.fs.s3n.multipart.uploads.enabled true
      spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version 2
      spark.hadoop.parquet.block.size 2147483648
      spark.hadoop.parquet.enable.summary-metadata false
      spark.jars.packages com.databricks:spark-avro_2.11:3.0.1,com.amazonaws:aws-java-sdk-pom:1.10.34
      spark.local.dir /raid0/spark
      spark.mesos.coarse false
      spark.mesos.constraints priority:1
      spark.network.timeout 600
      spark.rpc.message.maxSize 500
      spark.speculation false
      spark.sql.parquet.mergeSchema false
      spark.sql.planner.externalSort true
      spark.submit.deployMode client
      spark.task.cpus 1

      Exception here:

      ---------------------------------------------------------------------------
      timeout                                   Traceback (most recent call last)
      <ipython-input-1-6319dd276401> in <module>()
            2 sc = SparkContext()
            3 rdd = sc.parallelize(range(10))
      ----> 4 [x for x in rdd.toLocalIterator()]
      
      /foo/spark-2.0.1-bin-hadoop2.7/python/pyspark/rdd.pyc in _load_from_socket(port, serializer)
          140     try:
          141         rf = sock.makefile("rb", 65536)
      --> 142         for item in serializer.load_stream(rf):
          143             yield item
          144     finally:
      
      /foo/spark-2.0.1-bin-hadoop2.7/python/pyspark/serializers.pyc in load_stream(self, stream)
          137         while True:
          138             try:
      --> 139                 yield self._read_with_length(stream)
          140             except EOFError:
          141                 return
      
      /foo/spark-2.0.1-bin-hadoop2.7/python/pyspark/serializers.pyc in _read_with_length(self, stream)
          154 
          155     def _read_with_length(self, stream):
      --> 156         length = read_int(stream)
          157         if length == SpecialLengths.END_OF_DATA_SECTION:
          158             raise EOFError
      
      /foo/spark-2.0.1-bin-hadoop2.7/python/pyspark/serializers.pyc in read_int(stream)
          541 
          542 def read_int(stream):
      --> 543     length = stream.read(4)
          544     if not length:
          545         raise EOFError
      
      /usr/lib/python2.7/socket.pyc in read(self, size)
          378                 # fragmentation issues on many platforms.
          379                 try:
      --> 380                     data = self._sock.recv(left)
          381                 except error, e:
          382                     if e.args[0] == EINTR:
      
      timeout: timed out
      

      Attachments

        Issue Links

          Activity

            People

              viirya L. C. Hsieh
              lminer Luke Miner
              Votes:
              1 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: