Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-10723

LogManager leaks internal thread pool activity during shutdown

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.8.0
    • None
    • None

    Description

      TL;DR:

      The asynchronous shutdown in LogManager has the shortcoming that if during shutdown any of the internal futures fail, then we do not always ensure that all futures are completed before LogManager.shutdown returns. As a result, despite the shut down completed message from KafkaServer is seen in the error logs, some futures continue to run from inside LogManager attempting to close the logs. This is misleading and it could possibly break the general rule of avoiding post-shutdown activity in the Broker.

      Description:

      When LogManager is shutting down, exceptions in log closure are handled here. However, this line in the finally clause shuts down the thread pools asynchronously. The code: threadPools.foreach(.shutdown()) initiates an orderly shutdown (for each thread pool) in which previously submitted tasks are executed, but no new tasks will be accepted (see javadoc link here). As a result, if there is an exception during log closure, some of the thread pools which are closing logs could be leaked and continue to run in the background, after the control returns to the caller (i.e. KafkaServer). As a result, even after the "shut down completed" message is seen in the error logs (originating from KafkaServer shutdown sequence), log closures continue to happen in the background, which is misleading.
       

      Proposed options for fixes:

      It seems useful that we maintain the contract with KafkaServer that after LogManager.shutdown is called once, all tasks that close the logs are guaranteed to have completed before the call returns. There are probably couple different ways to fix this:

      1. Replace threadPools.foreach(.shutdown()) with threadPools.foreach(.awaitTermination()). This ensures that we wait for all threads to be shutdown before returning the LogManager.shutdown call.
      2. Skip creating of checkpoint and clean shutdown file only for the affected directory if any of its futures throw an error. We continue to wait for all futures to complete for all directories. This can require some changes to this for loop, so that we wait for all futures to complete regardless of whether one of them threw an error.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            kprakasam Kowshik Prakasam
            kprakasam Kowshik Prakasam
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment