Details
-
Bug
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Hi,
There are two problems with the barrier example code in the tutorial:
1) A znode created by a process in the function enter() is created with SEQUENTIAL suffix, however, the name of a znode deleted in the function leave() doesn't have this suffix. Actually, the leave() function tries to delete a nonexistent node => a KeeperException is thrown, which is caught silently => the process terminates without waiting for the barrier.
2) It seems that the very idea of leaving the barrier by deleting ephemeral nodes is problematic. Consider the following scenario: there are two clients: C1 and C2.
- C1 enters the barrier, creates a znode /b1/C1, checks that it's alone and starts waiting for the second client to come.
- C2 enters the barrier and creates a znode /b1/C2 - the notification to C1 is sent but still not delivered.
- C2 observes that there are enough children to /b1, enters the barrier, executes its own operations and invokes leave() procedure.
- during the leave() procedure C2 removes its znode /b1/C2 and exits.
- when the notification about C2's arrival finally arrives to C1, C1 checks the children of /b1 and doesn't find C2's znode: C1 is stuck.
The solution to this data race would be to create special znodes for leaving the barrier, similarly to the way they are created for entering the barrier.
Thanks,
Dima