Uploaded image for project: 'Crunch (Retired)'
  1. Crunch (Retired)
  2. CRUNCH-568

Aggregators fail on SparkPipeline

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.12.0
    • 0.14.0
    • Spark
    • None

    Description

      Logging this based on mailing list discussion
      http://mail-archives.apache.org/mod_mbox/crunch-user/201510.mbox/%3CCANb5z2KBqxZng92ToFo0MdTk2fd8jtGTjZ85h1yUo_akaetcXg%40mail.gmail.com%3E

      Running a Crunch SparkPipeline with FirstN aggregator results in a NullPointerException.

      Example to recreate this

      https://gist.github.com/nasokan/853ff80ce20ad7a78886

      Stack trace on driver logs.

      15/10/05 16:02:33 WARN TaskSetManager: Lost task 3.0 in stage 0.0 (TID 0, 123.domain.xyz): java.lang.NullPointerException
          at org.apache.crunch.impl.mr.run.UniformHashPartitioner.getPartition(UniformHashPartitioner.java:32)
          at org.apache.crunch.impl.spark.fn.PartitionedMapOutputFunction.call(PartitionedMapOutputFunction.java:62)
          at org.apache.crunch.impl.spark.fn.PartitionedMapOutputFunction.call(PartitionedMapOutputFunction.java:35)
          at org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1002)
          at org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1002)
          at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
          at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
          at org.apache.spark.util.collection.ExternalSorter.spillToPartitionFiles(ExternalSorter.scala:366)
          at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:211)
          at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
          at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
          at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
          at org.apache.spark.scheduler.Task.run(Task.scala:64)
          at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
          at java.lang.Thread.run(Thread.java:745)
      

      Attachments

        1. CRUNCH-568b.patch
          3 kB
          Josh Wills
        2. CRUNCH-568a.patch
          3 kB
          Micah Whitacre
        3. CRUNCH-568.patch
          1 kB
          Josh Wills

        Activity

          People

            mkwhitacre Micah Whitacre
            nithinasokan Nithin Asokan
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: