Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-50034

Fix Misreporting of Fatal Errors as Uncaught Exceptions in SparkUncaughtExceptionHandler

    XMLWordPrintableJSON

Details

    Description

      In Executor.scala, an exception is considered fatal (determined by isFatalError()) if any exception in the chain (or its causes) is fatal. This behavior is controlled by the spark.executor.killOnFatalError.depth config, which limits the depth to which the chain is inspected. If a fatal error is found, SparkUncaughtExceptionHandler is called.

      However, currently SparkUncaughtExceptionHandler only considers the top-level exception when reporting the exit code, rather than traversing the full exception chain to identify the true fatal cause. As a result, some fatal errors, such as OutOfMemoryError, are mistakenly reported as uncaught exceptions.

      For instance, if we have an OOM exception with the following structure:

      RuntimeException
       - Caused by: RuntimeException
       - Caused by: java.lang.OutOfMemory

      SparkUncaughtExceptionHandler would quit the executor with error code SparkExitCode.UNCAUGHT_EXCEPTION, when the true cause is an OOM error.

      This change intends to modify SparkUncaughtExceptionHandler to:

      • Inspect the exception chain (up to the configured depth).
      • Ensure that the actual fatal error is correctly identified and reflected in the exit code.

      Attachments

        Issue Links

          Activity

            People

              mingkangli Mingkang Li
              mingkangli Mingkang Li
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: