Regarding the implementation of lock acquisition/release in terms of zookeeper, can you elaborate on how you are proposing to map the lock hierarchy into the znode hierarchy?
It sounds like the znode paths will correspond to database.table.partition.subpartition... paths, which is good.
However, does the lock acquisition recipe actually need to reflect the hierarchy, which I think is what you are proposing? In other words, can't we just come up with a flat list of object locks to take (including both table-level and partition-level), sort them, and then acquire them each independently using the non-hierarchical recipe (except as you mention with failfast instead of wait)? If any fail, then delete them all before re-entering the retry loop.
Assuming the sorting matches the compound naming scheme, this should guarantee hierarchical lock acquisition order within each table.
Also, I do not understand the part below.
"The 'X' lock for table T is specified as follows:
- For all parent znodes of T, call getChildren() without setting the watch flag."
Do you mean "for the parent znode of T" rather than "all parent znodes of T", and this is supposed to apply for case where T is actually a partition?