The design principle of ZK API is simple, flexible and general, it can meets different scenarios from coordination, health member track, meta store, etc.
But there are some cost of this general design, which makes heavy and inefficient client code for recipes like distributed and semaphore, etc.
Currently, the general client side semaphore implementation without waiting time are:
- client A create sequential and ephemeral node N-1
- client B create sequential and ephemeral node N-2
- client A and B query all children and see if its holding the lock node with the smallest sequential id
- since client A has smaller sequential id, its the semaphore owner (assume semaphore value is 1)
- client B will delete the node, close the session, and probably try again later from step 2
All the contenders will issue 4 write (create session, create lock, delete lock, close session) and 1 read (get children), which are pretty heavy and not scale well.
We actually hit this issue internally for one heavy semaphore use case, and we have to create dozens of ensembles to support their traffic.
To make the semaphore recipe more efficient, we can move the semaphore implementation to server side, where leader has all the context about who'll win the semaphore/lock during txn preparation time, do short circuit and fail the contender directly without proposing and committing those create/delete lock transactions.
To implement this, we need to add new semaphore API, which suppose to replace client side lock, leader election (semaphore value 1), and general semaphore use cases.
We started to design and implement it recently, it will based on another big improvement we've almost finished and will soon upstream it in ZOOKEEPER-3594 to skip proposing requests with error transactions.
Meanwhile, we'd like to hear some early feedback from the community about this feature.