Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
Reviewed
-
Description
In ZKUtil#multiOrSequential, we accept a list of ZKUtilOp's to pass down to the ZooKeeper#multi(Iterable<Op>) method.
One problem with this approach is that we may generate a large list of ZNodes to mutate in one batch which exceeds the allowable client package length, specified by jute.maxbuffer.
This problem can manifest when we have a large number of WALs to replicate, queued in ZooKeeper, from a disabled peer. When that peer is dropped, the RS would submit deletes of those queued WALs. The RS will see ConnectionLoss for the resulting multi() calls it tries to make, because we are sending too large of a client message (because we're trying to delete too many WALs at once). The result (at least in branch-1 ish versions) is that the RS aborts after exceeding the ZK retries (as this operation will never succeed).
A simple fix would be to impose a maximum number of Ops to run in a single batch inside ZKUtil, and split apart the caller-submitted batch into smaller chunks. Before we make such a change, I do need to make sure that we don't have any expectations on atomicity of the operations. I'm not sure what ZK provides here – for the above example, splitting up batches of deletes is not an issue, but there could be issues with batches of creates where we only apply some.
Attachments
Attachments
Issue Links
- is related to
-
HBASE-24544 Recommend upping zk jute.maxbuffer in all but minor installs
- Open