Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-2244

pyspark - RDD action hangs (after previously succeeding)

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 1.1.0
    • 1.1.0
    • PySpark
    • system: fedora 20 w/ maven 3.1.1 and openjdk 1.7.0_55 & 1.8.0_05
      code: sha b88238fa (master on 23 june 2014)
      cluster: make-distribution.sh followed by ./dist/sbin/start-all.sh (running locally)

    Description

      $ ./dist/bin/pyspark
      Python 2.7.5 (default, Feb 19 2014, 13:47:28) 
      [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)] on linux2
      Type "help", "copyright", "credits" or "license" for more information.
      Welcome to
            ____              __
           / __/__  ___ _____/ /__
          _\ \/ _ \/ _ `/ __/  '_/
         /__ / .__/\_,_/_/ /_/\_\   version 1.0.0-SNAPSHOT
            /_/
      
      Using Python version 2.7.5 (default, Feb 19 2014 13:47:28)
      SparkContext available as sc.
      >>> hundy = sc.parallelize(range(100))
      >>> hundy.count()
      100
      >>> hundy.count()
      100
      >>> hundy.count()
      100
      [repeat until hang, ctrl-C to get]
      >>> hundy.count()
      ^CTraceback (most recent call last):
        File "<stdin>", line 1, in <module>
        File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", line 774, in count
          return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum()
        File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", line 765, in sum
          return self.mapPartitions(lambda x: [sum(x)]).reduce(operator.add)
        File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", line 685, in reduce
          vals = self.mapPartitions(func).collect()
        File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", line 649, in collect
          bytesInJava = self._jrdd.collect().iterator()
        File "/home/matt/Documents/Repositories/spark/dist/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py", line 535, in __call__
        File "/home/matt/Documents/Repositories/spark/dist/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py", line 363, in send_command
        File "/home/matt/Documents/Repositories/spark/dist/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py", line 472, in send_command
        File "/usr/lib64/python2.7/socket.py", line 430, in readline
          data = recv(1)
      KeyboardInterrupt
      

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            andrewor14 Andrew Or
            farrellee Matthew Farrellee
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment