Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-16349

ShutdownableThread fails build by calling Exit with race condition

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 3.8.0
    • 3.8.0
    • core
    • None

    Description

      `ShutdownableThread` calls `Exit.exit()` when the thread's operation throws FatalExitError. In normal operation, this calls System.exit, and exits the process. In tests, the exit procedure is masked with Exit.setExitProcedure to prevent tests that encounter a FatalExitError from crashing the test JVM.

      Masking of exit procedures is usually done in BeforeEach/AfterEach annotations, with the exit procedures cleaned up immediately after the test finishes. If the body of the test creates a ShutdownableThread that outlives the test, such as by omitting `ShutdownableThread#awaitShutdown`, by having `ShutdownableThread#awaitShutdown` be interrupted by a test timeout, or by a race condition between `Exit.resetExitProcedure` and `Exit.exit`, then System.exit() can be called erroneously.

       

      // First, in the test thread:
      Exit.setExitProcedure(...)
      try {
          new ShutdownableThread(...).start()
      } finally {
          Exit.resetExitProcedure()
      }
      // Second, in the ShutdownableThread:
      try {
          throw new FatalExitError(...)
      } catch (FatalExitError e) {
          Exit.exit(...) // Calls real System.exit()
      }

       

      This can be resolved by one of the following:

      1. Eliminate FatalExitError usages in code when setExitProcedure is in-use
      2. Eliminate the Exit.exit call from ShutdownableThread, and instead propagate this error to another thread to handle without a race-condition
      3. Eliminate resetExitProcedure by refactoring Exit to be non-static

      FatalExitError is in use in a small number of places, but may be difficult to eliminate:

      • FinalizedFeatureChangeListener
      • InterBrokerSendThread
      • TopicBasedRemoteLogMetadataManager

      There are many other places where Exit is called from a background thread, including some implementations of ShutdownableThread which don't use FatalExitError.

      The effect of this bug is that the build is flaky, as race conditions/timeouts in tests can cause the gradle executors to exit with status code 1, which has happened 26 times in the last 28 days. I have not yet been able to confirm this bug is happening in other tests, but I do have a deterministic reproduction case with the exact same symptoms:

      Gradle Test Run :core:test > Gradle Test Executor 38 > ShutdownableThreadTest > testShutdownWhenTestTimesOut(boolean) > "testShutdownWhenTestTimesOut(boolean).isInterruptible=true" SKIPPED
      FAILURE: Build failed with an exception.
      * What went wrong:
      Execution failed for task ':core:test'.
      > Process 'Gradle Test Executor 38' finished with non-zero exit value 1
        This problem might be caused by incorrect test process configuration.
        For more on test execution, please refer to https://docs.gradle.org/8.6/userguide/java_testing.html#sec:test_execution in the Gradle documentation.

      Attachments

        Issue Links

          Activity

            People

              gharris1727 Greg Harris
              gharris1727 Greg Harris
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: