Description
`ShutdownableThread` calls `Exit.exit()` when the thread's operation throws FatalExitError. In normal operation, this calls System.exit, and exits the process. In tests, the exit procedure is masked with Exit.setExitProcedure to prevent tests that encounter a FatalExitError from crashing the test JVM.
Masking of exit procedures is usually done in BeforeEach/AfterEach annotations, with the exit procedures cleaned up immediately after the test finishes. If the body of the test creates a ShutdownableThread that outlives the test, such as by omitting `ShutdownableThread#awaitShutdown`, by having `ShutdownableThread#awaitShutdown` be interrupted by a test timeout, or by a race condition between `Exit.resetExitProcedure` and `Exit.exit`, then System.exit() can be called erroneously.
// First, in the test thread: Exit.setExitProcedure(...) try { new ShutdownableThread(...).start() } finally { Exit.resetExitProcedure() } // Second, in the ShutdownableThread: try { throw new FatalExitError(...) } catch (FatalExitError e) { Exit.exit(...) // Calls real System.exit() }
This can be resolved by one of the following:
- Eliminate FatalExitError usages in code when setExitProcedure is in-use
- Eliminate the Exit.exit call from ShutdownableThread, and instead propagate this error to another thread to handle without a race-condition
- Eliminate resetExitProcedure by refactoring Exit to be non-static
FatalExitError is in use in a small number of places, but may be difficult to eliminate:
- FinalizedFeatureChangeListener
- InterBrokerSendThread
- TopicBasedRemoteLogMetadataManager
There are many other places where Exit is called from a background thread, including some implementations of ShutdownableThread which don't use FatalExitError.
The effect of this bug is that the build is flaky, as race conditions/timeouts in tests can cause the gradle executors to exit with status code 1, which has happened 26 times in the last 28 days. I have not yet been able to confirm this bug is happening in other tests, but I do have a deterministic reproduction case with the exact same symptoms:
Gradle Test Run :core:test > Gradle Test Executor 38 > ShutdownableThreadTest > testShutdownWhenTestTimesOut(boolean) > "testShutdownWhenTestTimesOut(boolean).isInterruptible=true" SKIPPED FAILURE: Build failed with an exception. * What went wrong: Execution failed for task ':core:test'. > Process 'Gradle Test Executor 38' finished with non-zero exit value 1 This problem might be caused by incorrect test process configuration. For more on test execution, please refer to https://docs.gradle.org/8.6/userguide/java_testing.html#sec:test_execution in the Gradle documentation.
Attachments
Issue Links
- is related to
-
KAFKA-16420 Refactor utils.Exit call-sites to use thread-safe alternative
- Open
- links to