Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-2244

pyspark - RDD action hangs (after previously succeeding)

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 1.1.0
    • Fix Version/s: 1.1.0
    • Component/s: PySpark
    • Environment:

      system: fedora 20 w/ maven 3.1.1 and openjdk 1.7.0_55 & 1.8.0_05
      code: sha b88238fa (master on 23 june 2014)
      cluster: make-distribution.sh followed by ./dist/sbin/start-all.sh (running locally)

      Description

      $ ./dist/bin/pyspark
      Python 2.7.5 (default, Feb 19 2014, 13:47:28) 
      [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)] on linux2
      Type "help", "copyright", "credits" or "license" for more information.
      Welcome to
            ____              __
           / __/__  ___ _____/ /__
          _\ \/ _ \/ _ `/ __/  '_/
         /__ / .__/\_,_/_/ /_/\_\   version 1.0.0-SNAPSHOT
            /_/
      
      Using Python version 2.7.5 (default, Feb 19 2014 13:47:28)
      SparkContext available as sc.
      >>> hundy = sc.parallelize(range(100))
      >>> hundy.count()
      100
      >>> hundy.count()
      100
      >>> hundy.count()
      100
      [repeat until hang, ctrl-C to get]
      >>> hundy.count()
      ^CTraceback (most recent call last):
        File "<stdin>", line 1, in <module>
        File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", line 774, in count
          return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum()
        File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", line 765, in sum
          return self.mapPartitions(lambda x: [sum(x)]).reduce(operator.add)
        File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", line 685, in reduce
          vals = self.mapPartitions(func).collect()
        File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", line 649, in collect
          bytesInJava = self._jrdd.collect().iterator()
        File "/home/matt/Documents/Repositories/spark/dist/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py", line 535, in __call__
        File "/home/matt/Documents/Repositories/spark/dist/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py", line 363, in send_command
        File "/home/matt/Documents/Repositories/spark/dist/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py", line 472, in send_command
        File "/usr/lib64/python2.7/socket.py", line 430, in readline
          data = recv(1)
      KeyboardInterrupt
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                andrewor14 Andrew Or
                Reporter:
                farrellee Matthew Farrellee
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: