Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-1712

ParallelCollectionRDD operations hanging forever without any error messages

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.9.0
    • Fix Version/s: 0.9.2, 1.0.1
    • Component/s: Spark Core
    • Labels:
      None
    • Environment:

      Linux Ubuntu 14.04, a single spark node; standalone mode.

      Description

      conf/spark-defaults.conf

      spark.akka.frameSize         5
      spark.default.parallelism    1
      
      scala> val collection = (1 to 1000000).map(i => ("foo" + i, i)).toVector
      collection: Vector[(String, Int)] = Vector((foo1,1), (foo2,2), (foo3,3), (foo4,4), (foo5,5), (foo6,6), (foo7,7), (foo8,8), (foo9,9), (foo10,10), (foo11,11), (foo12,12), (foo13,13), (foo14,14), (foo15,15), (foo16,16), (foo17,17), (foo18,18), (foo19,19), (foo20,20), (foo21,21), (foo22,22), (foo23,23), (foo24,24), (foo25,25), (foo26,26), (foo27,27), (foo28,28), (foo29,29), (foo30,30), (foo31,31), (foo32,32), (foo33,33), (foo34,34), (foo35,35), (foo36,36), (foo37,37), (foo38,38), (foo39,39), (foo40,40), (foo41,41), (foo42,42), (foo43,43), (foo44,44), (foo45,45), (foo46,46), (foo47,47), (foo48,48), (foo49,49), (foo50,50), (foo51,51), (foo52,52), (foo53,53), (foo54,54), (foo55,55), (foo56,56), (foo57,57), (foo58,58), (foo59,59), (foo60,60), (foo61,61), (foo62,62), (foo63,63), (foo64,64), (foo...
      
      scala> val rdd = sc.parallelize(collection)
      rdd: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[0] at parallelize at <console>:24
      
      scala> rdd.first
      res4: (String, Int) = (foo1,1)
      
      scala> rdd.map(_._2).sum
      // nothing happens
      
      

      CPU and I/O idle.
      Memory usage reported by JVM, after manually triggered GC:
      repl: 216 MB / 2 GB
      executor: 67 MB / 2 GB
      worker: 6 MB / 128 MB
      master: 6 MB / 128 MB

      No errors found in worker's stderr/stdout.

      It works fine with 700,000 elements and then it takes about 1 second to process the request and calculate the sum. With 700,000 items the spark executor memory doesn't even exceed 300 MB out of 2GB available. It fails with 800,000 items.

      Multiple parralelized collections of size 700,000 items at the same time in the same session work fine.

        Attachments

        1. worker.jstack.txt
          33 kB
          Piotr Kolaczkowski
        2. spark-hang.png
          133 kB
          Piotr Kolaczkowski
        3. repl.jstack.txt
          94 kB
          Piotr Kolaczkowski
        4. master.jstack.txt
          29 kB
          Piotr Kolaczkowski
        5. executor.jstack.txt
          50 kB
          Piotr Kolaczkowski

          Issue Links

            Activity

              People

              • Assignee:
                gq Guoqiang Li
                Reporter:
                pkolaczk Piotr Kolaczkowski
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: