[KAFKA-6051] ReplicaFetcherThread should close the ReplicaFetcherBlockingSend earlier on shutdown - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 0.10.0.0, 0.10.0.1, 0.10.1.0, 0.10.1.1, 0.10.2.0, 0.10.2.1, 0.11.0.0
Fix Version/s: 1.1.0
Component/s: None
Labels:
None

Description

The ReplicaFetcherBlockingSend works as designed and will blocks until it is able to get data. This becomes a problem when we are gracefully shutting down a broker. The controller will attempt to shutdown the fetchers and elect new leaders. When the last fetch of partition is removed, as part of the replicaManager.becomeLeaderOrFollower call will proceed to shut down any idle ReplicaFetcherThread. The shutdown process here can block up to until the last fetch request completes. This blocking delay is a big problem because the replicaStateChangeLock, and mapLock in AbstractFetcherManager is still locked causing latency spikes on multiple brokers.

At this point in time, we do not need the last response as the fetcher is shutting down. We should close the leaderEndpoint early during initiateShutdown() instead of after super.shutdown().

For example we see here the shutdown blocked the broker from processing more replica changes for ~500 ms

[2017-09-01 18:11:42,879] INFO [ReplicaFetcherThread-0-2], Shutting down (kafka.server.ReplicaFetcherThread) 
[2017-09-01 18:11:43,314] INFO [ReplicaFetcherThread-0-2], Stopped (kafka.server.ReplicaFetcherThread) 
[2017-09-01 18:11:43,314] INFO [ReplicaFetcherThread-0-2], Shutdown completed (kafka.server.ReplicaFetcherThread)

Attachments

Issue Links

links to

GitHub Pull Request #4056

Activity

People

Assignee:: Maytee Chinavanichkit

Reporter:: Maytee Chinavanichkit

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 11/Oct/17 10:39

Updated:: 01/Oct/18 17:04

Resolved:: 18/Oct/17 16:59