Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-2292

NullPointerException in JavaPairRDD.mapToPair

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Not A Problem
    • Affects Version/s: 1.0.0
    • Fix Version/s: None
    • Component/s: Spark Core
    • Labels:
      None
    • Environment:

      Spark 1.0.0, Standalone with the master & single slave running on Ubuntu on a laptop. 4G mem and 8 cores were available to the executor .

      Description

      Correction: Invoking JavaPairRDD.mapToPair results in an NPE:

      14/06/26 21:05:35 WARN scheduler.TaskSetManager: Loss was due to java.lang.NullPointerException
      java.lang.NullPointerException
      	at org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:750)
      	at org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.apply(JavaPairRDD.scala:750)
      	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
      	at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:59)
      	at org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:96)
      	at org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:95)
      	at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:582)
      	at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:582)
      	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
      	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
      	at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
      	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158)
      	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
      	at org.apache.spark.scheduler.Task.run(Task.scala:51)
      	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
      	at java.lang.Thread.run(Thread.java:722)
      

      This occurs only after migrating to the 1.0.0 API. The details of the code the data file used to test are included in this gist : https://gist.github.com/reachbach/d8977c8eb5f71f889301

        Attachments

        1. SPARK-2292-aash-repro.tar.gz
          64 kB
          Andrew Ash

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                reachbach Bharath Ravi Kumar
              • Votes:
                0 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: