Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-41313

AM shutdown hook fails with IllegalStateException if AM crashes on startup (recurrence of SPARK-3900)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 3.4.0
    • 3.4.0
    • Spark Core, YARN
    • None

    Description

      SPARK-3900 fixed the IllegalStateException in cleanupStagingDir in ApplicationMaster's shutdownhook. However, SPARK-21138 accidentally reverted/undid that change when fixing the "Wrong FS" bug. Now, we are seeing SPARK-3900 reported by our users at Linkedin. We need to bring back the fix for SPARK-3900.

      The illegalStateException when creating a new filesystem object is due to the limitation in Hadoop that we can not register a shutdownhook during shutdown. So, when a spark job fails during pre-launch, as part of shutdown, cleanupStagingDir would be called. Then, if we attempt to create a new filesystem object for the first time, HDFS would try to register a hook to shutdown KeyProviderCache when creating a ClientContext for DFSClient. As a result, we hit the IllegalStateException. We should avoid the creation of a new filesystem object in cleanupStagingDir() when it is called in a shutdown hook. This was introduced in SPARK-3900. However, SPARK-21138 accidentally reverted/undid that change. We need to bring back that fix to Spark to avoid the IllegalStateException.

        

      Attachments

        Issue Links

          Activity

            People

              xinglin Xing Lin
              xinglin Xing Lin
              Erik Krogen Erik Krogen
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: