Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-5395

Distributed Herder Deadlocks on Shutdown

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 0.10.2.1
    • Fix Version/s: 0.10.2.2, 0.11.0.0
    • Component/s: KafkaConnect
    • Labels:
      None

      Description

      We're trying to upgrade Kafka Connect to 0.10.2.1 and see that the process does not shut down cleanly. It hangs instead. From what I can tell KAFKA-4786 introduced this deadlock.

      close on the AbstractCoordinator is marked as synchronized and acquires the coordinator's monitor. The first thing it tries to do is join the heartbeat thread.

      Meanwhile, the heartbeat thread is synchronized on the same monitor, which it relinquishes when it waits. But for the wait to return (and the run method of the heartbeat to terminate) it needs to reacquire that monitor.

      There's no way for the heartbeat thread to reacquire the monitor since it is held by the distributed herder thread. And the distributed herder will never relinquish the monitor since it is waiting for the heartbeat thread to join.

      I am attaching a thread dump illustrating the situation. Take note in particular of threads #178 (the heartbeat thread) and #159 (the herder thread). The former is BLOCKED trying to reacquire 0x00000007406cc0c0, and the latter is WAITING on the heartbeat thread to join, having itself acquired 0x00000007406cc0c0.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                rsivaram Rajini Sivaram
                Reporter:
                mjaschob@twilio.com Michael Jaschob
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: