Uploaded image for project: 'Crunch'
  1. Crunch
  2. CRUNCH-568

Aggregators fail on SparkPipeline

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.12.0
    • Fix Version/s: 0.14.0
    • Component/s: Spark
    • Labels:
      None

      Description

      Logging this based on mailing list discussion
      http://mail-archives.apache.org/mod_mbox/crunch-user/201510.mbox/%3CCANb5z2KBqxZng92ToFo0MdTk2fd8jtGTjZ85h1yUo_akaetcXg%40mail.gmail.com%3E

      Running a Crunch SparkPipeline with FirstN aggregator results in a NullPointerException.

      Example to recreate this

      https://gist.github.com/nasokan/853ff80ce20ad7a78886

      Stack trace on driver logs.

      15/10/05 16:02:33 WARN TaskSetManager: Lost task 3.0 in stage 0.0 (TID 0, 123.domain.xyz): java.lang.NullPointerException
          at org.apache.crunch.impl.mr.run.UniformHashPartitioner.getPartition(UniformHashPartitioner.java:32)
          at org.apache.crunch.impl.spark.fn.PartitionedMapOutputFunction.call(PartitionedMapOutputFunction.java:62)
          at org.apache.crunch.impl.spark.fn.PartitionedMapOutputFunction.call(PartitionedMapOutputFunction.java:35)
          at org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1002)
          at org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:1002)
          at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
          at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
          at org.apache.spark.util.collection.ExternalSorter.spillToPartitionFiles(ExternalSorter.scala:366)
          at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:211)
          at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
          at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
          at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
          at org.apache.spark.scheduler.Task.run(Task.scala:64)
          at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
          at java.lang.Thread.run(Thread.java:745)
      

        Attachments

        1. CRUNCH-568.patch
          1 kB
          Josh Wills
        2. CRUNCH-568a.patch
          3 kB
          Micah Whitacre
        3. CRUNCH-568b.patch
          3 kB
          Josh Wills

          Activity

            People

            • Assignee:
              mkwhitacre Micah Whitacre
              Reporter:
              nithinasokan Nithin Asokan
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: