Uploaded image for project: 'ZooKeeper'
  1. ZooKeeper
  2. ZOOKEEPER-3145

Potential watch missing issue due to stale pzxid when replaying CloseSession txn with fuzzy snapshot

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 3.5.4, 3.6.0, 3.4.13
    • 3.6.0
    • server

    Description

      This is another issue I found recently, we haven't seen this problem on prod (or maybe we don't notice).

       
      Currently, the CloseSession is not idempotent, executing the CloseSession twice won't get the same result.
       
      The problem is that closeSession will only check what's the ephemeral nodes associated with that session bases on current states. Nodes deleted during taking fuzzy snapshot won't be deleted again when replay the txn.
       
      This looks fine, since it's already gone, but there is problem with the pzxid of the parent node. Snapshot is taken fuzzily, so it's possible that the parent had been serialized while the nodes are being deleted when executing the closeSession Txn. The pzxid will not be updated in the snapshot when replaying the closeSession txn, because doesn't know what's the paths being deleted, so it won't patch the pzxid like what we did in the deleteNode ZOOKEEPER-3125.
       
      The inconsistent pzxid will lead to potential watch notification missing when client reconnect with setWatches because of the staleness. 
       
      This JIRA is going to fix those issues by adding the CloseSessionTxn, it will record all those nodes being deleted in that CloseSession txn, so that we know which nodes to update when replaying the txn.

      Attachments

        Activity

          People

            lvfangmin Fangmin Lv
            lvfangmin Fangmin Lv
            Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 7h 20m
                7h 20m