Uploaded image for project: 'Bookkeeper'
  1. Bookkeeper
  2. BOOKKEEPER-960

Re-replicator: bookie checking should handle flapping bookie registration

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: In Progress
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: bookkeeper-server

      Description

      Problem:

      currently re-replicator only uses the view of available/readonly bookies right at the time doing bookie checking. it would accidentally treat a bookie disappeared from zookeeper (e.g. zookeeper session expired, bookie restarted, flapping bookie registration due to network/gc) as lost bookies, which introduce unnecessary re-replication.

      Solution:

      introduce 'auditorStaleBookieInterval', if a bookie never register in the given interval, it would be marked as 'stale' bookies and re-replicate all ledgers belongs to that bookie. the default value is set 30 minutes.

      Fixes:

      • refactor bookie watcher to allow notifying bookie list thru BookiesListener
      • introduce 'auditorStaleBookieInterval' to be able to mark bookies as 'stale' if bookies aren't registered themselves to zookeeper
      • add more info logging about critical steps on re-replication logic
      • misc changes

        Attachments

          Activity

            People

            • Assignee:
              yzang Yiming Zang
              Reporter:
              yzang Yiming Zang
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: