[SPARK-1712] ParallelCollectionRDD operations hanging forever without any error messages - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 0.9.0
Fix Version/s: 0.9.2, 1.0.1
Component/s: Spark Core
Labels:
None
Environment:

Linux Ubuntu 14.04, a single spark node; standalone mode.

Description

conf/spark-defaults.conf

spark.akka.frameSize         5
spark.default.parallelism    1

scala> val collection = (1 to 1000000).map(i => ("foo" + i, i)).toVector
collection: Vector[(String, Int)] = Vector((foo1,1), (foo2,2), (foo3,3), (foo4,4), (foo5,5), (foo6,6), (foo7,7), (foo8,8), (foo9,9), (foo10,10), (foo11,11), (foo12,12), (foo13,13), (foo14,14), (foo15,15), (foo16,16), (foo17,17), (foo18,18), (foo19,19), (foo20,20), (foo21,21), (foo22,22), (foo23,23), (foo24,24), (foo25,25), (foo26,26), (foo27,27), (foo28,28), (foo29,29), (foo30,30), (foo31,31), (foo32,32), (foo33,33), (foo34,34), (foo35,35), (foo36,36), (foo37,37), (foo38,38), (foo39,39), (foo40,40), (foo41,41), (foo42,42), (foo43,43), (foo44,44), (foo45,45), (foo46,46), (foo47,47), (foo48,48), (foo49,49), (foo50,50), (foo51,51), (foo52,52), (foo53,53), (foo54,54), (foo55,55), (foo56,56), (foo57,57), (foo58,58), (foo59,59), (foo60,60), (foo61,61), (foo62,62), (foo63,63), (foo64,64), (foo...

scala> val rdd = sc.parallelize(collection)
rdd: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[0] at parallelize at <console>:24

scala> rdd.first
res4: (String, Int) = (foo1,1)

scala> rdd.map(_._2).sum
// nothing happens

CPU and I/O idle.
Memory usage reported by JVM, after manually triggered GC:
repl: 216 MB / 2 GB
executor: 67 MB / 2 GB
worker: 6 MB / 128 MB
master: 6 MB / 128 MB

No errors found in worker's stderr/stdout.

It works fine with 700,000 elements and then it takes about 1 second to process the request and calculate the sum. With 700,000 items the spark executor memory doesn't even exceed 300 MB out of 2GB available. It fails with 800,000 items.

Multiple parralelized collections of size 700,000 items at the same time in the same session work fine.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

executor.jstack.txt
04/May/14 19:15
50 kB
Piotr Kolaczkowski
master.jstack.txt
04/May/14 19:15
29 kB
Piotr Kolaczkowski
repl.jstack.txt
04/May/14 19:15
94 kB
Piotr Kolaczkowski
spark-hang.png
04/May/14 19:15
133 kB
Piotr Kolaczkowski
worker.jstack.txt
04/May/14 19:15
33 kB
Piotr Kolaczkowski

Issue Links

is related to

SPARK-2156 When the size of serialized results for one partition is slightly smaller than 10MB (the default akka.frameSize), the execution blocks

Resolved

Activity

People

Assignee:: Guoqiang Li

Reporter:: Piotr Kolaczkowski

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 04/May/14 19:14

Updated:: 18/Jun/14 14:41

Resolved:: 28/May/14 22:58