Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-1020

ksck with snapshot reports divergence even if a server is just behind

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Private Beta
    • 1.4.0
    • ksck
    • None

    Description

      Something seems to be wrong about how ksck handles checksum timestamps. I have a recently-restarted cluster, and I ran ksck. One of the tablets has a replica which was "lost" – ie it fell too far behind and therefore could never be caught up. ksck is just reporting it as a bad checksum. Shouldn't it instead try to wait until the provided timestamp is "safe", and if the wait times out, give an error that it's too far behind?

      As a stopgap, maybe we could have ksck also include the latest opid in the error printout, to make it more obvious that a server is just "behind" and not divergent?

      Attachments

        Issue Links

          Activity

            People

              wdberkeley William Berkeley
              tlipcon Todd Lipcon
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: