Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-6449

Driver OOM results in reported application result SUCCESS

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 1.3.0
    • None
    • YARN
    • None

    Description

      I ran a job yesterday that according to the History Server and YARN RM finished with status SUCCESS.

      Clicking around on the history server UI, there were too few stages run, and I couldn't figure out why that would have been.

      Finally, inspecting the end of the driver's logs, I saw:

      15/03/20 15:08:13 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
      15/03/20 15:08:13 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
      15/03/20 15:08:13 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
      15/03/20 15:08:13 INFO spark.SparkContext: Successfully stopped SparkContext
      Exception in thread "Driver" scala.MatchError: java.lang.OutOfMemoryError: GC overhead limit exceeded (of class java.lang.OutOfMemoryError)
              at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:485)
      15/03/20 15:08:13 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0, (reason: Shutdown hook called before final status was reported.)
      15/03/20 15:08:13 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED (diag message: Shutdown hook called before final status was reported.)
      15/03/20 15:08:13 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
      15/03/20 15:08:13 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered.
      15/03/20 15:08:13 INFO yarn.ApplicationMaster: Deleting staging directory .sparkStaging/application_1426705269584_0055
      

      The driver OOM'd, the catch block that presumably should have caught it threw a MatchError, and then SUCCESS was returned to YARN and written to the event log.

      This should be logged as a failed job and reported as such to YARN.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              rdub Ryan Williams
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: