Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-1045

ExternalAppendOnlyMap Iterator throw no such element on joining two large rdd

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 0.9.0
    • 0.9.1, 1.0.0
    • Shuffle, Spark Core
    • None

    Description

      On latest master branch 05be7047744c88e64e7e6bd973f9bcfacd00da5f, I keep getting no such element from a single shuffle task when performance join on two large rdd (memory spill 10G, disk spill 800m for a single task)

      the code is here:
      https://gist.github.com/guojc/8643741

      the exception is

      java.util.NoSuchElementException (java.util.NoSuchElementException)
      org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:304)
      org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:239)
      org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:29)
      scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
      scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
      scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
      org.apache.spark.rdd.PairRDDFunctions.org$apache$spark$rdd$PairRDDFunctions$$writeToFile$1(PairRDDFunctions.scala:724)
      org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$2.apply(PairRDDFunctions.scala:734)
      org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$2.apply(PairRDDFunctions.scala:734)
      org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109)
      org.apache.spark.scheduler.Task.run(Task.scala:53)
      org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213)
      org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49)
      org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
      java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
      java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
      java.lang.Thread.run(Thread.java:662)

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              guojc Jiacheng Guo
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: