Uploaded image for project: 'Kafka'
  1. Kafka
  2. KAFKA-2743

Forwarding task reconfigurations in Copycat can deadlock with rebalances and has no backoff

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.9.0.0
    • KafkaConnect
    • None

    Description

      There are two issues with the way we're currently forwarding task reconfigurations. First, the forwarding is performed synchronously in the DistributedHerder's main processing loop. If node A forwards a task reconfiguration and node B has started a rebalance process, we can end up with distributed deadlock because node A will be blocking on the HTTP request in the thread that would otherwise handle heartbeating and rebalancing.

      Second, currently we just retry aggressively with no backoff. In some cases the node that is currently thought to be the leader will legitimately be down (it shutdown and the node sending the request didn't rebalance yet), so we need some backoff to avoid unnecessarily hammering the network and the huge log files that result from constant errors.

      Attachments

        Activity

          People

            ewencp Ewen Cheslack-Postava
            ewencp Ewen Cheslack-Postava
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: