Details
-
Bug
-
Status: Resolved
-
Normal
-
Resolution: Fixed
-
None
-
Normal
Description
It's currently possible for DataResolver to accumulate more changes to read repair that would fit in a single serialized mutation. If that happens, the node receiving the mutation would fail, and the read would time out, and won't be able to proceed until the operator runs repair or manually drops the affected partitions.
Ideally we should either read repair iteratively, or at least split the resulting mutation into smaller chunks in the end. In the meantime, for 3.0.x, I suggest we add logging to catch this, and a -D flag to allow proceeding with the requests as is when the mutation is too large, without read repair.