[KAFKA-4478] Deadlock between heartbeat executor, group metadata manager and request handler - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 0.10.1.0
Fix Version/s: None
Component/s: None
Labels:
- reliability

Description

We are running a 0.10.1.0 cluster with 3 brokers with ids 0, 1 and 2.
At about 2016-12-01 21:29 something happened with broker 1 and since then I see

java.io.IOException: Connection to 1 was disconnected before the response was read

errors in the logs of broker 0 and 2. Clients were unable to produce to broker 1's partitions and JMX counters indicates underreplicated partitions.

I took a stack trace on broker-1 and I see that there is a deadlock between the JVM threads:

Found one Java-level deadlock:
=============================
"executor-Heartbeat":
  waiting to lock monitor 0x00007ffa24029df8 (object 0x00000000cc52fe70, a kafka.coordinator.GroupMetadata),
  which is held by "group-metadata-manager-0"
"group-metadata-manager-0":
  waiting to lock monitor 0x00007ff9900a83a8 (object 0x00000000ca8b0820, a java.util.LinkedList),
  which is held by "kafka-request-handler-7"
"kafka-request-handler-7":
  waiting to lock monitor 0x00007ffa24029df8 (object 0x00000000cc52fe70, a kafka.coordinator.GroupMetadata),
  which is held by "group-metadata-manager-0"

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

broker-0.controller.log.2016-12-01-21
02/Dec/16 13:05
63 kB
Sinóros-Szabó Péter
broker-0.server.log.2016-12-01-21
02/Dec/16 13:05
29 kB
Sinóros-Szabó Péter
kafka-1.jstack
02/Dec/16 13:05
66 kB
Sinóros-Szabó Péter
kafka-1.server.log.2016-12-01-21
02/Dec/16 13:05
10 kB
Sinóros-Szabó Péter
kafka-1.state-change.log
02/Dec/16 13:05
10 kB
Sinóros-Szabó Péter

Issue Links

duplicates

KAFKA-3994 Deadlock between consumer heartbeat expiration and offset commit.

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Sinóros-Szabó Péter

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 02/Dec/16 13:00

Updated:: 03/Dec/16 00:12

Resolved:: 03/Dec/16 00:12