Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-20404

Regression with accumulator names when migrating from 1.6 to 2.x

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.0.0, 2.0.1, 2.0.2, 2.1.0
    • 2.1.2, 2.2.0
    • Spark Core
    • None
    • Spark: 2.1
      Scala: 2.11
      Spark master: local

    • Patch

    Description

      Creating accumulator with explicitly specified name equal to null, like the following

      sparkContext.accumulator(0, null)
      

      throws exception at runtime

      ERROR | DAGScheduler | dag-scheduler-event-loop | Failed to update accumulators for task 0
      java.lang.NullPointerException
      	at org.apache.spark.util.AccumulatorV2$$anonfun$1.apply(AccumulatorV2.scala:106)
      	at org.apache.spark.util.AccumulatorV2$$anonfun$1.apply(AccumulatorV2.scala:106)
      	at scala.Option.exists(Option.scala:240)
      	at org.apache.spark.util.AccumulatorV2.toInfo(AccumulatorV2.scala:106)
      	at org.apache.spark.scheduler.DAGScheduler$$anonfun$updateAccumulators$1.apply(DAGScheduler.scala:1091)
      	at org.apache.spark.scheduler.DAGScheduler$$anonfun$updateAccumulators$1.apply(DAGScheduler.scala:1080)
      	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
      	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
      	at org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:1080)
      	at org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1183)
      	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1647)
      	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605)
      	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594)
      	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
      

      The issue is in wrapping name into Some instead of Option when creating accumulators.

      Patch is available.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            szhemzhitsky Sergey Zhemzhitsky
            szhemzhitsky Sergey Zhemzhitsky
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment