Uploaded image for project: 'Bookkeeper'
  1. Bookkeeper
  2. BOOKKEEPER-237

Automatic recovery of under-replicated ledgers and its entries

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Implemented
    • 4.0.0, 4.1.0
    • None
    • None

    Description

      As per the current design of BookKeeper, if one of the BookKeeper server dies, there is no automatic mechanism to identify and recover the under replicated ledgers and its corresponding entries. This would lead to losing the successfully written entries, which will be a critical problem in sensitive systems. This document is trying to describe few proposals to overcome these limitations.

      Attachments

        1. Auto Recovery and Bookie sync-ups.pdf
          225 kB
          Rakesh Radhakrishnan
        2. Auto Recovery Detection - distributed chain approach.doc
          74 kB
          Rakesh Radhakrishnan
        3. BookKeeper-Auto-Recovery-Updated-To-Latest.pdf
          494 kB
          Uma Maheswara Rao G
        1.
        Recording of underreplication of ledger entries Sub-task Closed Ivan Kelly
        2.
        Detection of under replication Sub-task Closed Ivan Kelly
        3.
        Rereplicating of under replicated data Sub-task Closed Uma Maheswara Rao G
        4.
        Provide automatic mechanism to know bookie failures Sub-task Closed Rakesh Radhakrishnan
        5.
        Ability to disable auto recovery temporarily Sub-task Closed Rakesh Radhakrishnan
        6.
        bookkeeper does not put enough meta-data in to do recovery properly Sub-task Closed Ivan Kelly
        7.
        Periodic checking of ledger replication status Sub-task Closed Ivan Kelly
        8.
        Provide LedgerFragmentReplicator which should replicate the fragments found from LedgerChecker Sub-task Closed Uma Maheswara Rao G
        9.
        Prepare bookie vs ledgers cache and will be used by the Auditor Sub-task Closed Rakesh Radhakrishnan
        10.
        Provide distributed lock implementation which will be used by Replication worker while replicating fragments. Sub-task Resolved Uma Maheswara Rao G
        11.
        Exceptions for replication Sub-task Closed Ivan Kelly
        12.
        Manage auditing and replication processes Sub-task Closed Vinayakumar B
        13.
        Delay the replication of a ledger if RW found that its last fragment is in underReplication. Sub-task Closed Uma Maheswara Rao G
        14.
        Document about Auto replication service in BK Sub-task Closed Uma Maheswara Rao G
        15.
        LedgerManagers should consider 'underreplication' node as a special Znode Sub-task Closed Uma Maheswara Rao G
        16.
        ReplicationWorker may not get ZK watcher notification on UnderReplication ledger lock deletion. Sub-task Closed Uma Maheswara Rao G
        17.
        ZkLedgerUnderreplicationManager.markLedgerUnderreplicated() is adding duplicate missingReplicas while multiple bk failed for the same ledger Sub-task Closed Rakesh Radhakrishnan
        18.
        Clean up LedgerManagerFactory and LedgerManager usage in tests Sub-task Closed Rakesh Radhakrishnan
        19.
        replicateLedgerFragment should throw Exceptions in error conditions Sub-task Closed Uma Maheswara Rao G
        20.
        It should not be possible to replicate a ledger fragment which is at the end of an open ledger Sub-task Closed Ivan Kelly
        21.
        Ledger entries should be replicated sequentially instead of parallel. Sub-task Closed Uma Maheswara Rao G
        22.
        Let's add Thread name for ReplicationWorker thread. Sub-task Closed Uma Maheswara Rao G
        23.
        Integration Test - Perform bookie rereplication cycle by Auditor-RW processes Sub-task Closed Rakesh Radhakrishnan
        24.
        LedgerChecker returns underreplicated fragments for an closed ledger with no entries Sub-task Closed Ivan Kelly
        25.
        Hierarchical zk underreplication manager should clean up its hierarchy when done to allow for fast acquisition of underreplicated entries Sub-task Closed Ivan Kelly
        26.
        Store hostname of locker in replication lock Sub-task Closed Ivan Kelly
        27.
        SingleFragmentCallback should be created with the fragment first entry id, not the first stored id Sub-task Resolved Uma Maheswara Rao G
        28.
        Lock does not guarantee any access order and not giving chance to longest-waiting RW Sub-task Resolved Rakesh Radhakrishnan
        29.
        Make auditor Vote znode store a protobuf containing the host that voted Sub-task Closed Ivan Kelly
        30.
        Expose command options in bookie scripts to disable/enable auto recovery temporarily Sub-task Closed Rakesh Radhakrishnan
        31.
        Provide an option to start Autorecovery along with Bookie Servers Sub-task Closed Uma Maheswara Rao G
        32.
        Ensure that the auditor and replication worker will shutdown if they lose their ZK session Sub-task Closed Ivan Kelly

        Activity

          People

            rakeshr Rakesh Radhakrishnan
            rakeshr Rakesh Radhakrishnan
            Votes:
            1 Vote for this issue
            Watchers:
            13 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: