[KUDU-1020] ksck with snapshot reports divergence even if a server is just behind - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: Private Beta
Fix Version/s: 1.4.0
Component/s: ksck
Labels:
None

Target Version/s:

Public beta

Description

Something seems to be wrong about how ksck handles checksum timestamps. I have a recently-restarted cluster, and I ran ksck. One of the tablets has a replica which was "lost" – ie it fell too far behind and therefore could never be caught up. ksck is just reporting it as a bad checksum. Shouldn't it instead try to wait until the provided timestamp is "safe", and if the wait times out, give an error that it's too far behind?

As a stopgap, maybe we could have ksck also include the latest opid in the error printout, to make it more obvious that a server is just "behind" and not divergent?

Attachments

Issue Links

relates to

KUDU-1012 ksck says "snapshot time in the future" on busy table

Resolved

KUDU-1056 Make ksck support safe time and rework ksck snapshot tests

Resolved

Activity

People

Assignee:: William Berkeley

Reporter:: Todd Lipcon

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 17/Aug/15 13:46

Updated:: 16/May/17 17:20

Resolved:: 16/May/17 17:20