Bookkeeper
  1. Bookkeeper
  2. BOOKKEEPER-237

Automatic recovery of under-replicated ledgers and its entries

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Implemented
    • Affects Version/s: 4.0.0, 4.1.0
    • Fix Version/s: None
    • Labels:
      None

      Description

      As per the current design of BookKeeper, if one of the BookKeeper server dies, there is no automatic mechanism to identify and recover the under replicated ledgers and its corresponding entries. This would lead to losing the successfully written entries, which will be a critical problem in sensitive systems. This document is trying to describe few proposals to overcome these limitations.

      1.
      Recording of underreplication of ledger entries Sub-task Closed Ivan Kelly
       
      2.
      Detection of under replication Sub-task Closed Ivan Kelly
       
      3.
      Rereplicating of under replicated data Sub-task Closed Uma Maheswara Rao G
       
      4.
      Provide automatic mechanism to know bookie failures Sub-task Closed Rakesh R
       
      5.
      Ability to disable auto recovery temporarily Sub-task Closed Rakesh R
       
      6.
      bookkeeper does not put enough meta-data in to do recovery properly Sub-task Closed Ivan Kelly
       
      7.
      Periodic checking of ledger replication status Sub-task Closed Ivan Kelly
       
      8.
      Provide LedgerFragmentReplicator which should replicate the fragments found from LedgerChecker Sub-task Closed Uma Maheswara Rao G
       
      9.
      Prepare bookie vs ledgers cache and will be used by the Auditor Sub-task Closed Rakesh R
       
      10.
      Provide distributed lock implementation which will be used by Replication worker while replicating fragments. Sub-task Resolved Uma Maheswara Rao G
       
      11.
      Exceptions for replication Sub-task Closed Ivan Kelly
       
      12.
      Manage auditing and replication processes Sub-task Closed Vinayakumar B
       
      13.
      Delay the replication of a ledger if RW found that its last fragment is in underReplication. Sub-task Closed Uma Maheswara Rao G
       
      14.
      Document about Auto replication service in BK Sub-task Closed Uma Maheswara Rao G
       
      15.
      LedgerManagers should consider 'underreplication' node as a special Znode Sub-task Closed Uma Maheswara Rao G
       
      16.
      ReplicationWorker may not get ZK watcher notification on UnderReplication ledger lock deletion. Sub-task Closed Uma Maheswara Rao G
       
      17.
      ZkLedgerUnderreplicationManager.markLedgerUnderreplicated() is adding duplicate missingReplicas while multiple bk failed for the same ledger Sub-task Closed Rakesh R
       
      18.
      Clean up LedgerManagerFactory and LedgerManager usage in tests Sub-task Closed Rakesh R
       
      19.
      replicateLedgerFragment should throw Exceptions in error conditions Sub-task Closed Uma Maheswara Rao G
       
      20.
      It should not be possible to replicate a ledger fragment which is at the end of an open ledger Sub-task Closed Ivan Kelly
       
      21.
      Ledger entries should be replicated sequentially instead of parallel. Sub-task Closed Uma Maheswara Rao G
       
      22.
      Let's add Thread name for ReplicationWorker thread. Sub-task Closed Uma Maheswara Rao G
       
      23.
      Integration Test - Perform bookie rereplication cycle by Auditor-RW processes Sub-task Closed Rakesh R
       
      24.
      LedgerChecker returns underreplicated fragments for an closed ledger with no entries Sub-task Closed Ivan Kelly
       
      25.
      Hierarchical zk underreplication manager should clean up its hierarchy when done to allow for fast acquisition of underreplicated entries Sub-task Closed Ivan Kelly
       
      26.
      Store hostname of locker in replication lock Sub-task Closed Ivan Kelly
       
      27.
      SingleFragmentCallback should be created with the fragment first entry id, not the first stored id Sub-task Resolved Uma Maheswara Rao G
       
      28.
      Lock does not guarantee any access order and not giving chance to longest-waiting RW Sub-task Resolved Rakesh R
       
      29.
      Make auditor Vote znode store a protobuf containing the host that voted Sub-task Closed Ivan Kelly
       
      30.
      Expose command options in bookie scripts to disable/enable auto recovery temporarily Sub-task Closed Rakesh R
       
      31.
      Provide an option to start Autorecovery along with Bookie Servers Sub-task Closed Uma Maheswara Rao G
       
      32.
      Ensure that the auditor and replication worker will shutdown if they lose their ZK session Sub-task Closed Ivan Kelly
       

        Activity

        Rakesh R created issue -
        Rakesh R made changes -
        Field Original Value New Value
        Attachment Auto Recovery and Bookie sync-ups.pdf [ 12525615 ]
        Rakesh R made changes -
        Uma Maheswara Rao G made changes -
        Component/s bookkeeper-auto-recovery [ 12319500 ]
        Component/s bookkeeper-client [ 12314393 ]
        Component/s bookkeeper-server [ 12314394 ]
        Uma Maheswara Rao G made changes -
        Flavio Junqueira made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Implemented [ 10 ]
        Flavio Junqueira made changes -
        Fix Version/s 4.2.0 [ 12320244 ]
        Flavio Junqueira made changes -
        Affects Version/s 4.1.0 [ 12319145 ]
        Ivan Kelly made changes -
        Fix Version/s 4.2.0 [ 12320244 ]

          People

          • Assignee:
            Rakesh R
            Reporter:
            Rakesh R
          • Votes:
            1 Vote for this issue
            Watchers:
            14 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development