[ZOOKEEPER-4689] Node may not accessible due the the inconsistent ACL reference map after SNAP sync (again) - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Critical
Resolution: Unresolved
Affects Version/s: 3.6.0, 3.7.0, 3.8.0
Fix Version/s: None
Component/s: server
Labels:
- pull-request-available

Description

In Zookeeper, we do a "fuzzy snapshot". It means that we don't make a copy of the DataTree or grab a lock or anything when serializing a DataTree. Instead, we note down the zxid when we start serializing DataTree. We serialize the DataTree while it's getting mutated and replay the {transactions after starting to take snapshot} after deserializing the DataTree. The idea is that those transactions should be idempotent.

Zookeeper also implements its own interned ACL. It keeps a [long -> ACL] map and store the `long` in each node as nodes tend to share the same ACL.

When serializing DataTree, we first serialize the ACL cache and then serialize the nodes. It's possible that with the following sequence, a node points to an invalid ACL entry:

1. Serialize ACL
2. Create node with new ACL
3. Serialize node

~~ZOOKEEPER-3306~~ fixes this by making sure to insert the ACL to cache upon calling `DataTree.createNode` when replaying transactions and when the node already exists. However, we only insert it to the cache, but do not set the interned ACL in the node to point to the new entry.

It's possible that the longval we get for the ACL is inconsistent, even though we follow the same zxid ordering of events. Specifically, we keep a [aclIndex] pointing to the max entry that currently exists and increment that whenever we need to intern a new ACL we've never seen before.

With ~~ZOOKEEPER-2214~~, we started to do reference counting in ACL cache and remove no-longer used entries from the cache.

Say the current aclIndex is 10. If we create a node with ACL unseen before and delete that node, aclIndex will increment to 11. However, when we deserialize the tree, we'll set aclIndex to the max existent ACL cache entry, so it's reverted back to 10. aclIndex inconsistency on its own is fine but it causes problem to the ~~ZOOKEEPER-3306~~ patch.

Now if we follow the same scenario mentioned in ~~ZOOKEEPER-3306~~:

Leader creates ACL entry 11 and delete it due to node deletion
Server A starts to have snap sync with leader
After serializing the ACL map to Server A, there is a txn T1 to create a node N1 with new ACL_1 which was not exist in ACL map
On leader, after this txn, the ACL map will be 12 -> (ACL_1, COUNT: 1), and data tree N1 -> 12
On server A, it will be ACL map with max ID 10, and N1 -> 12 in fuzzy snapshot
When replaying the txn T1, it will add 11 -> (ACL_1, COUNT: 1) to the ACL cache but the node N1 still points to 12.

N1 still points to invalid ACL entry.

There are two ways to fix this:

Make aclIndex consistent upon re-deserialization (by either serializing it in snapshot or paying special attention to decrement it when removing cache)
Fix ~~ZOOKEEPER-3306~~ patch so that we also override the ACL of node to new key if previous entry does not exist in the ACL table.

I think solution 2 is nicer as aclIndex inconsistency itself is not a problem. With solution 1, we're still implicitly depending on aclIndex consistency and ordering of events. It's harder to reason about and it seems more fragile than solution 1.

I'm going to send a patch for solution 2 but please let me know if you disagree and I'm happy to go with solution 1 instead.

Attachments

Issue Links

supercedes

ZOOKEEPER-4836 Inconsist aclIndex leads to MarshallingError

Resolved

links to

GitHub Pull Request #1997

Activity

People

Assignee:: Unassigned

Reporter:: Adam Yi

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 20/Apr/23 03:39

Updated:: 03/Dec/24 03:32

Time Tracking

Estimated:

Not Specified

Remaining:

Logged: