Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-3150

NullPointerException in Spark recovery after simultaneous fall of master and driver

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.0.2
    • 1.0.3, 1.1.1
    • Spark Core
    • None
    • Linux 3.2.0-23-generic x86_64

    Description

      The issue happens when Spark is run standalone on a cluster.

      When master and driver fall simultaneously on one node in a cluster, master tries to recover its state and restart spark driver.
      While restarting driver, it falls with NPE exception (stacktrace is below).
      After falling, it restarts and tries to recover its state and restart Spark driver again. It happens over and over in an infinite cycle.

      Namely, Spark tries to read DriverInfo state from zookeeper, but after reading it happens to be null in DriverInfo.worker.

      Stacktrace (on version 1.0.0, but reproduceable on version 1.0.2, too)

      2014-08-14 21:44:59,519] ERROR (akka.actor.OneForOneStrategy)
      java.lang.NullPointerException
      at org.apache.spark.deploy.master.Master$$anonfun$completeRecovery$5.apply(Master.scala:448)
      at org.apache.spark.deploy.master.Master$$anonfun$completeRecovery$5.apply(Master.scala:448)
      at scala.collection.TraversableLike$$anonfun$filter$1.apply(TraversableLike.scala:264)
      at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
      at scala.collection.TraversableLike$class.filter(TraversableLike.scala:263)
      at scala.collection.AbstractTraversable.filter(Traversable.scala:105)
      at org.apache.spark.deploy.master.Master.completeRecovery(Master.scala:448)
      at org.apache.spark.deploy.master.Master$$anonfun$receive$1.applyOrElse(Master.scala:376)
      at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
      at akka.actor.ActorCell.invoke(ActorCell.scala:456)
      at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
      at akka.dispatch.Mailbox.run(Mailbox.scala:219)
      at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
      at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
      at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
      at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
      at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

      How to reproduce: kill all Spark processes when running Spark standalone on a cluster on some cluster node, where driver runs (kill driver, master and worker simultaneously).

      Attachments

        Activity

          People

            Unassigned Unassigned
            tanyatik Tatiana Borisova
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: