Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-2292

NullPointerException in JavaPairRDD.mapToPair

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • 1.0.0
    • None
    • Spark Core
    • None
    • Spark 1.0.0, Standalone with the master & single slave running on Ubuntu on a laptop. 4G mem and 8 cores were available to the executor .

    Description

      Correction: Invoking JavaPairRDD.mapToPair results in an NPE:

      14/06/26 21:05:35 WARN scheduler.TaskSetManager: Loss was due to java.lang.NullPointerException
      java.lang.NullPointerException
      	at org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:750)
      	at org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:750)
      	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
      	at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:59)
      	at org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:96)
      	at org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:95)
      	at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:582)
      	at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:582)
      	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
      	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
      	at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
      	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158)
      	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
      	at org.apache.spark.scheduler.Task.run(Task.scala:51)
      	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
      	at java.lang.Thread.run(Thread.java:722)
      

      This occurs only after migrating to the 1.0.0 API. The details of the code the data file used to test are included in this gist : https://gist.github.com/reachbach/d8977c8eb5f71f889301

      Attachments

        1. SPARK-2292-aash-repro.tar.gz
          64 kB
          Andrew Ash

        Issue Links

          Activity

            People

              Unassigned Unassigned
              reachbach Bharath Ravi Kumar
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: