Uploaded image for project: 'Apache Helix'
  1. Apache Helix
  2. HELIX-742

ZkHelixManager should consider session expire when detecting connection flapping

Add voteWatch issue
    XMLWordPrintableJSON

Details

    • Task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      In production we are seeing is because of infinite expiry-connect loop. These caused live instance change and trigger massive state transitions. As a result, controller overloads the ZK with thousands of  messages, and bring down the cluster.

       

      Currently, when ZkHelixManager detects connection flapping, it only counts disconnects, but not session expiry, we need to take session expiry into consideration as well.

       

      AC:

      • follow up this ticket with a plan to consolidate semantics and behavior
      • Code complete and test it out

      Attachments

        Activity

          People

            Unassigned Unassigned
            hzzh0301 Harry Zhang

            Dates

              Created:
              Updated:

              Slack

                Issue deployment