[KAFKA-10723] LogManager leaks internal thread pool activity during shutdown - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.8.0
Component/s: None
Labels:
None

Description

TL;DR:

The asynchronous shutdown in LogManager has the shortcoming that if during shutdown any of the internal futures fail, then we do not always ensure that all futures are completed before LogManager.shutdown returns. As a result, despite the shut down completed message from KafkaServer is seen in the error logs, some futures continue to run from inside LogManager attempting to close the logs. This is misleading and it could possibly break the general rule of avoiding post-shutdown activity in the Broker.

Description:

When LogManager is shutting down, exceptions in log closure are handled here. However, this line in the finally clause shuts down the thread pools asynchronously. The code: threadPools.foreach(.shutdown()) initiates an orderly shutdown (for each thread pool) in which previously submitted tasks are executed, but no new tasks will be accepted (see javadoc link here). As a result, if there is an exception during log closure, some of the thread pools which are closing logs could be leaked and continue to run in the background, after the control returns to the caller (i.e. KafkaServer). As a result, even after the "shut down completed" message is seen in the error logs (originating from KafkaServer shutdown sequence), log closures continue to happen in the background, which is misleading.

Proposed options for fixes:

It seems useful that we maintain the contract with KafkaServer that after LogManager.shutdown is called once, all tasks that close the logs are guaranteed to have completed before the call returns. There are probably couple different ways to fix this:

Replace threadPools.foreach(.shutdown()) with threadPools.foreach(.awaitTermination()). This ensures that we wait for all threads to be shutdown before returning the LogManager.shutdown call.
Skip creating of checkpoint and clean shutdown file only for the affected directory if any of its futures throw an error. We continue to wait for all futures to complete for all directories. This can require some changes to this for loop, so that we wait for all futures to complete regardless of whether one of them threw an error.

Attachments

Issue Links

links to

GitHub Pull Request #9596

Activity

People

Assignee:: Kowshik Prakasam

Reporter:: Kowshik Prakasam

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 13/Nov/20 23:53

Updated:: 19/Nov/20 18:55

Resolved:: 19/Nov/20 18:55