Uploaded image for project: 'ZooKeeper'
  1. ZooKeeper
  2. ZOOKEEPER-1486

A couple of bugs in the tutorial code

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • documentation
    • None

    Description

      Hi,
      There are two problems with the barrier example code in the tutorial:

      1) A znode created by a process in the function enter() is created with SEQUENTIAL suffix, however, the name of a znode deleted in the function leave() doesn't have this suffix. Actually, the leave() function tries to delete a nonexistent node => a KeeperException is thrown, which is caught silently => the process terminates without waiting for the barrier.

      2) It seems that the very idea of leaving the barrier by deleting ephemeral nodes is problematic. Consider the following scenario: there are two clients: C1 and C2.

      • C1 enters the barrier, creates a znode /b1/C1, checks that it's alone and starts waiting for the second client to come.
      • C2 enters the barrier and creates a znode /b1/C2 - the notification to C1 is sent but still not delivered.
      • C2 observes that there are enough children to /b1, enters the barrier, executes its own operations and invokes leave() procedure.
      • during the leave() procedure C2 removes its znode /b1/C2 and exits.
      • when the notification about C2's arrival finally arrives to C1, C1 checks the children of /b1 and doesn't find C2's znode: C1 is stuck.
        The solution to this data race would be to create special znodes for leaving the barrier, similarly to the way they are created for entering the barrier.

      Thanks,
      Dima

      Attachments

        Activity

          People

            Unassigned Unassigned
            dimap Dmitri Perelman
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: