Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-13593

ThrottledChannelReaper slows broker shutdown by multiple seconds



    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 3.0.0
    • None
    • core
    • None


      We run an embedded KRaft broker in integration tests, to test that our Producer / Consumers are all hooked up and verify our overall pipeline.

      While trying to deliver CI speed improvements, we noticed that the majority of time for a small test is actually spent in Kafka broker shutdown. From the logs, it looks like the ClientQuotaManager has multiple ThrottledChannelReaper threads and each of them takes up to a second to shutdown.

      2022-01-12T15:26:31.932Z [main] INFO  kafka.log.LogManager - Shutdown complete.
      2022-01-12T15:26:31.934Z [main] INFO  k.s.ClientQuotaManager$ThrottledChannelReaper - [ThrottledChannelReaper-Fetch]: Shutting down
      2022-01-12T15:26:32.311Z [main] INFO  k.s.ClientQuotaManager$ThrottledChannelReaper - [ThrottledChannelReaper-Fetch]: Shutdown completed
      2022-01-12T15:26:32.311Z [ThrottledChannelReaper-Fetch] INFO  k.s.ClientQuotaManager$ThrottledChannelReaper - [ThrottledChannelReaper-Fetch]: Stopped
      2022-01-12T15:26:32.311Z [main] INFO  k.s.ClientQuotaManager$ThrottledChannelReaper - [ThrottledChannelReaper-Produce]: Shutting down
      2022-01-12T15:26:33.312Z [ThrottledChannelReaper-Produce] INFO  k.s.ClientQuotaManager$ThrottledChannelReaper - [ThrottledChannelReaper-Produce]: Stopped
      2022-01-12T15:26:33.312Z [main] INFO  k.s.ClientQuotaManager$ThrottledChannelReaper - [ThrottledChannelReaper-Produce]: Shutdown completed
      2022-01-12T15:26:33.312Z [main] INFO  k.s.ClientQuotaManager$ThrottledChannelReaper - [ThrottledChannelReaper-Request]: Shutting down
      2022-01-12T15:26:34.315Z [ThrottledChannelReaper-Request] INFO  k.s.ClientQuotaManager$ThrottledChannelReaper - [ThrottledChannelReaper-Request]: Stopped
      2022-01-12T15:26:34.315Z [main] INFO  k.s.ClientQuotaManager$ThrottledChannelReaper - [ThrottledChannelReaper-Request]: Shutdown completed
      2022-01-12T15:26:34.315Z [main] INFO  k.s.ClientQuotaManager$ThrottledChannelReaper - [ThrottledChannelReaper-ControllerMutation]: Shutting down
      2022-01-12T15:26:35.317Z [main] INFO  k.s.ClientQuotaManager$ThrottledChannelReaper - [ThrottledChannelReaper-ControllerMutation]: Shutdown completed
      2022-01-12T15:26:35.317Z [ThrottledChannelReaper-ControllerMutation] INFO  k.s.ClientQuotaManager$ThrottledChannelReaper - [ThrottledChannelReaper-ControllerMutation]: Stopped

      Inspecting the code, the ThrottledChannelReaper threads are marked as not interruptible, so ShutdownableThread does not interrupt the worker on shutdown. Unfortunately, the doWork method polls with a 1 second timeout. So you wait up to 1s for every type of quota, in this case for a total of almost 4s.


      While this is not a problem for production Kafka brokers, where instances are expected to stay up for months, in tests that expect to spin up and down it is easily noticed and adds real overhead to CI.


      Suggested possible remediations:

      • Allow ThrottledChannelReaper to be interrupted by ShutdownableThread
      • ThrottledChannelReaper overrides initiateShutdown and after calling super.initiateShutdown then enqueues a null element on the delayQueue to force worker thread to notice shutdown state
      • Reduce 1s poll timeout to something small (carries overhead penalty for all users though, so this is less desirable), or make it configurable so we can set it to e.g. 10ms in unit tests




            Unassigned Unassigned
            stevenschlansker Steven Schlansker
            0 Vote for this issue
            3 Start watching this issue