Uploaded image for project: 'Apache Helix'
  1. Apache Helix
  2. HELIX-134 HelixManager zk session expiry/gc handling
  3. HELIX-195

Race condition between FINALIZE callbacks and Zk Callbacks

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.6.2-incubating
    • None
    • None
    • Sprint #4 10/2 - 10/16

    Description

      FINALIZE callbacks are sent async via CallbackHandler#reset(), while Zk callbacks are queued in ZkEventThread. It's possible that we are handling a FINALIZE callback before all Zk callbacks are cleaned up. This creates race conditions, for example, in zk session expiry, when a GenericController gets a FINALIZE callback, it cleans up all listeners using ZkClient#unsubscribe(), but Zk callbacks leftover in ZkEventThread comes later, and re-subscribe all listeners, causing zk watcher leaking.

      This is observed by setting up two controllers and expire the leader (by simulating a long gc). The second controller takes the leadership and add all listeners, but when the former leader recovers from gc, it gets leftover Zk callbacks and re-subscribe the live-instance listener hence react to all live-instance changes, though it doesn't acquire the leadership.

      Attachments

        Activity

          People

            dafu Zhen Zhang
            dafu Zhen Zhang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: