Uploaded image for project: 'ZooKeeper'
  1. ZooKeeper
  2. ZOOKEEPER-4837

Network issue causes ephemeral node unremoved after the session expiration

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • 3.10.0
    • None
    • quorum, server

    Description

      In our testing cluster with the latest ZooKeeper version (66202cb), we observed that sometimes an ephemeral node never gets deleted if there is a network issue during the PROPOSAL request, even after the session expires. This bug is essentially related to ZOOKEEPER-2355, but the issue was not entirely fixed in the previous patch. We also tested on some related open PRs (e.g., https://github.com/apache/zookeeper/pull/2152 and https://github.com/apache/zookeeper/pull/1925 ), and confirmed the issue exists after the proposed fix.

       

      Steps to reproduce this bug:

      1. Start a cluster with 3 servers, follower A, leader B, follower C
      2. Open a zk client in server A
      3. Create an ephemeral node in the client
      4. Inject network issue in all server that causes SocketTimeoutException during readPacket if the packet is a PROPOSAL
      5. Close the client
      6. Wait until the cluster is stable (the leader will change between B and C several times)
      7. Remove the network issue from all server
      8. Check every server for ephemeral node existence. The ephemeral node will exist in server A. However, server B and C will not have the ephemeral node anymore.

       

      Essentially the bug is caused by loadDatabase loading a snapshot with a higher Zxid than the truncated log, causing fastForwardFromEdits to fail when trying to replay the transactions. For example, if one of the follower has a lastProcessedZxid of 0x200000001 and last snapshot snapshot.200000001, and the leader sends a TRUNC with a zxid of 100000002, truncateLog will truncate the follower's log to 100000002. However, loadDatabase will load snapshot.200000001. So when fastForwardFromEdits happens, it will set the data tree to 200000001 instead of 100000002.

       

      We also attached a test case to reproduce this issue. Note that this test case is still pretty flaky at this moment.

       

      We proposed to fix this case by loading the database from the last snapshot that happens before the last truncated-log entry during truncateLog. See our PR attached. Of course, this may not be the ideal solution and we welcome suggestions. Some other potential solutions include: 

      (1) Disable fastForwardDatabase in shutdown

      (2) Run setLastProcessedZxid at the end of Learner's syncWithLeader function if the packet is Leader.DIFF 

       

      Your insights are very much appreciated. We will continue following up this issue until it is resolved.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              dparikesit Dimas Shidqi Parikesit
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 10m
                  10m