ZooKeeper
  1. ZooKeeper
  2. ZOOKEEPER-1367

Data inconsistencies and unexpired ephemeral nodes after cluster restart

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 3.4.2
    • Fix Version/s: 3.4.3, 3.3.5, 3.5.0
    • Component/s: server
    • Labels:
      None
    • Environment:

      Debian Squeeze, 64-bit

    • Hadoop Flags:
      Reviewed
    • Release Note:
      Fix Data inconsistencies and unexpired ephemeral nodes after cluster restart.

      Description

      In one of our tests, we have a cluster of three ZooKeeper servers. We kill all three, and then restart just two of them. Sometimes we notice that on one of the restarted servers, ephemeral nodes from previous sessions do not get deleted, while on the other server they do. We are effectively running 3.4.2, though technically we are running 3.4.1 with the patch manually applied for ZOOKEEPER-1333 and a C client for 3.4.1 with the patches for ZOOKEEPER-1163.

      I noticed that when I connected using zkCli.sh to the first node (90.0.0.221, zkid 84), I saw only one znode in a particular path:

      [zk: 90.0.0.221:2888(CONNECTED) 0] ls /election/zkrsm
      [nominee0000000011]
      [zk: 90.0.0.221:2888(CONNECTED) 1] get /election/zkrsm/nominee0000000011
      90.0.0.222:7777
      cZxid = 0x400000027
      ctime = Thu Jan 19 08:18:24 UTC 2012
      mZxid = 0x400000027
      mtime = Thu Jan 19 08:18:24 UTC 2012
      pZxid = 0x400000027
      cversion = 0
      dataVersion = 0
      aclVersion = 0
      ephemeralOwner = 0xa234f4f3bc220001
      dataLength = 16
      numChildren = 0

      However, when I connect zkCli.sh to the second server (90.0.0.222, zkid 251), I saw three znodes under that same path:

      [zk: 90.0.0.222:2888(CONNECTED) 2] ls /election/zkrsm
      nominee0000000006 nominee0000000010 nominee0000000011
      [zk: 90.0.0.222:2888(CONNECTED) 2] get /election/zkrsm/nominee0000000011
      90.0.0.222:7777
      cZxid = 0x400000027
      ctime = Thu Jan 19 08:18:24 UTC 2012
      mZxid = 0x400000027
      mtime = Thu Jan 19 08:18:24 UTC 2012
      pZxid = 0x400000027
      cversion = 0
      dataVersion = 0
      aclVersion = 0
      ephemeralOwner = 0xa234f4f3bc220001
      dataLength = 16
      numChildren = 0
      [zk: 90.0.0.222:2888(CONNECTED) 3] get /election/zkrsm/nominee0000000010
      90.0.0.221:7777
      cZxid = 0x30000014c
      ctime = Thu Jan 19 07:53:42 UTC 2012
      mZxid = 0x30000014c
      mtime = Thu Jan 19 07:53:42 UTC 2012
      pZxid = 0x30000014c
      cversion = 0
      dataVersion = 0
      aclVersion = 0
      ephemeralOwner = 0xa234f4f3bc220000
      dataLength = 16
      numChildren = 0
      [zk: 90.0.0.222:2888(CONNECTED) 4] get /election/zkrsm/nominee0000000006
      90.0.0.223:7777
      cZxid = 0x200000cab
      ctime = Thu Jan 19 08:00:30 UTC 2012
      mZxid = 0x200000cab
      mtime = Thu Jan 19 08:00:30 UTC 2012
      pZxid = 0x200000cab
      cversion = 0
      dataVersion = 0
      aclVersion = 0
      ephemeralOwner = 0x5434f5074e040002
      dataLength = 16
      numChildren = 0

      These never went away for the lifetime of the server, for any clients connected directly to that server. Note that this cluster is configured to have all three servers still, the third one being down (90.0.0.223, zkid 162).

      I captured the data/snapshot directories for the the two live servers. When I start single-node servers using each directory, I can briefly see that the inconsistent data is present in those logs, though the ephemeral nodes seem to get (correctly) cleaned up pretty soon after I start the server.

      I will upload a tar containing the debug logs and data directories from the failure. I think we can reproduce it regularly if you need more info.

      1. ZOOKEEPER-1367-3.3.patch
        12 kB
        Benjamin Reed
      2. 1367-3.3.patch
        8 kB
        Ted Yu
      3. ZOOKEEPER-1367-3.4.patch
        16 kB
        Benjamin Reed
      4. ZOOKEEPER-1367.patch
        16 kB
        Benjamin Reed
      5. ZOOKEEPER-1367.patch
        15 kB
        Benjamin Reed
      6. ZOOKEEPER-1367.tgz
        5.06 MB
        Jeremy Stribling

        Activity

        Benjamin Reed made changes -
        Attachment ZOOKEEPER-1367-3.3.patch [ 12512604 ]
        Mahadev konar made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Hadoop Flags Reviewed [ 10343 ]
        Release Note Fix Data inconsistencies and unexpired ephemeral nodes after cluster restart.
        Fix Version/s 3.3.5 [ 12319081 ]
        Fix Version/s 3.5.0 [ 12316644 ]
        Resolution Fixed [ 1 ]
        Patrick Hunt made changes -
        Assignee Benjamin Reed [ breed ]
        Ted Yu made changes -
        Attachment 1367-3.3.patch [ 12512285 ]
        Benjamin Reed made changes -
        Attachment ZOOKEEPER-1367-3.4.patch [ 12512281 ]
        Benjamin Reed made changes -
        Attachment ZOOKEEPER-1367.patch [ 12512280 ]
        Benjamin Reed made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Benjamin Reed made changes -
        Attachment ZOOKEEPER-1367.patch [ 12512274 ]
        Mahadev konar made changes -
        Fix Version/s 3.4.3 [ 12319288 ]
        Priority Major [ 3 ] Blocker [ 1 ]
        Jeremy Stribling made changes -
        Field Original Value New Value
        Attachment ZOOKEEPER-1367.tgz [ 12511304 ]
        Jeremy Stribling created issue -

          People

          • Assignee:
            Benjamin Reed
            Reporter:
            Jeremy Stribling
          • Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development