Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
0.8.1
-
None
-
None
Description
I don't think this is a bug introduced in 0.8.1., but triggered by the fact
that controlled shutdown seems to have become slower in 0.8.1 (will file a
separate ticket to investigate that). When doing a rolling bounce, it is
possible for a bounced broker to stop all its replica fetchers since the
previous PID's shutdown requests are still being shutdown.
- 515 is the controller
- Controlled shutdown initiated for 503
- Controller starts controlled shutdown for 503
- The controlled shutdown takes a long time in moving leaders and moving
follower replicas on 503 to the offline state. - So 503's read from the shutdown channel times out and a new channel is
created. It issues another shutdown request. This request (since it is a
new channel) is accepted at the controller's socket server but then waits
on the broker shutdown lock held by the previous controlled shutdown which
is still in progress. - The above step repeats for the remaining retries (six more requests).
- 503 hits SocketTimeout exception on reading the response of the last
shutdown request and proceeds to do an unclean shutdown. - The controller's onBrokerFailure call-back fires and moves 503's replicas
to offline (not too important in this sequence). - 503 is brought back up.
- The controller's onBrokerStartup call-back fires and moves its replicas
(and partitions) to online state. 503 starts its replica fetchers. - Unfortunately, the (phantom) shutdown requests are still being handled and
the controller sends StopReplica requests to 503. - The first shutdown request finally finishes (after 76 minutes in my case!).
- The remaining shutdown requests also execute and do the same thing (sends
StopReplica requests for all partitions to
503). - The remaining requests complete quickly because they end up not having to
touch zookeeper paths - no leaders left on the broker and no need to
shrink ISR in zookeeper since it has already been done by the first
shutdown request. - So in the end-state 503 is up, but effectively idle due to the previous
PID's shutdown requests.
There are some obvious fixes that can be made to controlled shutdown to help
address the above issue. E.g., we don't really need to move follower
partitions to Offline. We did that as an "optimization" so the broker falls
out of ISR sooner - which is helpful when producers set required.acks to -1.
However it adds a lot of latency to controlled shutdown. Also, (more
importantly) we should have a mechanism to abort any stale shutdown process.
Attachments
Issue Links
- is duplicated by
-
KAFKA-4207 Partitions stopped after a rapid restart of a broker
- Resolved
- is part of
-
KAFKA-7235 Use brokerZkNodeVersion to prevent broker from processing outdated controller request
- Resolved