Details
Description
$ ./dist/bin/pyspark Python 2.7.5 (default, Feb 19 2014, 13:47:28) [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)] on linux2 Type "help", "copyright", "credits" or "license" for more information. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 1.0.0-SNAPSHOT /_/ Using Python version 2.7.5 (default, Feb 19 2014 13:47:28) SparkContext available as sc. >>> hundy = sc.parallelize(range(100)) >>> hundy.count() 100 >>> hundy.count() 100 >>> hundy.count() 100 [repeat until hang, ctrl-C to get] >>> hundy.count() ^CTraceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", line 774, in count return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum() File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", line 765, in sum return self.mapPartitions(lambda x: [sum(x)]).reduce(operator.add) File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", line 685, in reduce vals = self.mapPartitions(func).collect() File "/home/matt/Documents/Repositories/spark/dist/python/pyspark/rdd.py", line 649, in collect bytesInJava = self._jrdd.collect().iterator() File "/home/matt/Documents/Repositories/spark/dist/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py", line 535, in __call__ File "/home/matt/Documents/Repositories/spark/dist/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py", line 363, in send_command File "/home/matt/Documents/Repositories/spark/dist/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py", line 472, in send_command File "/usr/lib64/python2.7/socket.py", line 430, in readline data = recv(1) KeyboardInterrupt
Attachments
Issue Links
- duplicates
-
SPARK-2242 Running sc.parallelize(..).count() hangs pyspark
- Resolved
- is duplicated by
-
SPARK-2242 Running sc.parallelize(..).count() hangs pyspark
- Resolved