Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.3.3, 3.4.0
    • Component/s: contrib-bookkeeper
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Recover the ledger fragments of a bookie once it crashes.

      1. ZOOKEEPER-712-3.3.patch
        57 kB
        Flavio Junqueira
      2. ZOOKEEPER-712.patch
        57 kB
        Erwin Tam

        Issue Links

          Activity

          Hide
          Erwin Tam added a comment -

          First pass at implementing bookie recovery as an admin tool. The patch uploaded has two new files:
          BookieRecoveryTest.java and BookKeeperTools.java (with a corresponding new directory under bookkeeper/tools).

          The main API that BookKeeperTools.java implements is the following:
          public void asyncRecoverBookieData(final InetSocketAddress bookieSrc, final InetSocketAddress bookieDest,
          final RecoverCallback cb, final Object context);

          The synchronous version of this API is here:
          public void recoverBookieData(final InetSocketAddress bookieSrc, final InetSocketAddress bookieDest)
          throws InterruptedException;

          This is to recover all of the bookie ledger data that was present on a dead input bookieSrc. The other input 'bookieDest' is optional and if passed, we will recover all data to that bookie server specifically. Otherwise for each ledger being recovered that was stored on the bookieSrc, we choose one of the other available bookie servers and re-replicate the ledger fragment entries to there.

          A command line way to invoke this is done via a main method in BookKeeperTools which expects the following 2-3 input parameters.
          "USAGE: BookKeeperTools zkServers bookieSrc [bookieDest]"
          zkServers is a comma separated list of host:port pairs for the ZooKeeper servers in the cluster.
          bookieSrc is the host:port for the bookie server we are recovering data from.
          bookieDest is the host:port (optional) for the bookie server we want to recover the data to.

          Current limitations: There is no way to know what each ledger's digest type or key/password are. That metadata isn't stored anywhere we can access readily. Thus for now, we are assuming that all ledgers to be recovered have been created with the same digest type and password. These values are set via java system properties called "digestType" and "passwd". Once we have a way to store and retrieve this information (via ZK most likely and stored by the Bookie servers when new ledgers are created), we can modify this aspect of the bookie recovery tool.

          Pseudocode:
          Inputs:
          zkServers - Used to create a ZK client so we can read in all of the BK metadata needed to perform the bookie recovery.
          bookieSrc - Used to match against ledger metadata indicating which bookie servers comprise the ensembles that make up a ledger.
          bookieDest - Optionally used to write the recovered ledger data to directly.

          1. Sync with ZK
          2. Read from ZK to get all available bookie servers (only if bookieDest was not passed).
          3. Read from ZK to get all active ledgers.
          4. For each ledger, open it to obtain the LedgerHandle.
          5. For each ledger fragment, see if the dead input bookieSrc is a part of the ensemble for it.
          6. For each ledger fragment that needs to be recovered, find the entries that were actually stored on the bookieSrc. Since we stripe data across bookies in the ensemble, not all ledger entries in the fragment would have been stored on the bookieSrc.
          7. For each of the ledger entries in the fragment that were stored on the bookieSrc, use the BookKeeper client to read it. Using the client will take care of choosing one of the other bookies where this data is available since the bookieSrc server is dead.
          8. Choosing a new bookieDest (either passed in explicitly or chosen at random currently among the available bookie servers), write this ledger entry directly to there using the BookieClient (the layer that talks to the bookie servers directly).
          9. Once all ledger entries for all fragments for a ledger have been recovered and re-replicated, update ZK so the ledger's metadata now points to the new bookie server where this data has been re-replicated to. We choose a single new bookie for each ledger to store the recovered data.

          Show
          Erwin Tam added a comment - First pass at implementing bookie recovery as an admin tool. The patch uploaded has two new files: BookieRecoveryTest.java and BookKeeperTools.java (with a corresponding new directory under bookkeeper/tools). The main API that BookKeeperTools.java implements is the following: public void asyncRecoverBookieData(final InetSocketAddress bookieSrc, final InetSocketAddress bookieDest, final RecoverCallback cb, final Object context); The synchronous version of this API is here: public void recoverBookieData(final InetSocketAddress bookieSrc, final InetSocketAddress bookieDest) throws InterruptedException; This is to recover all of the bookie ledger data that was present on a dead input bookieSrc. The other input 'bookieDest' is optional and if passed, we will recover all data to that bookie server specifically. Otherwise for each ledger being recovered that was stored on the bookieSrc, we choose one of the other available bookie servers and re-replicate the ledger fragment entries to there. A command line way to invoke this is done via a main method in BookKeeperTools which expects the following 2-3 input parameters. "USAGE: BookKeeperTools zkServers bookieSrc [bookieDest] " zkServers is a comma separated list of host:port pairs for the ZooKeeper servers in the cluster. bookieSrc is the host:port for the bookie server we are recovering data from. bookieDest is the host:port (optional) for the bookie server we want to recover the data to. Current limitations: There is no way to know what each ledger's digest type or key/password are. That metadata isn't stored anywhere we can access readily. Thus for now, we are assuming that all ledgers to be recovered have been created with the same digest type and password. These values are set via java system properties called "digestType" and "passwd". Once we have a way to store and retrieve this information (via ZK most likely and stored by the Bookie servers when new ledgers are created), we can modify this aspect of the bookie recovery tool. Pseudocode: Inputs: zkServers - Used to create a ZK client so we can read in all of the BK metadata needed to perform the bookie recovery. bookieSrc - Used to match against ledger metadata indicating which bookie servers comprise the ensembles that make up a ledger. bookieDest - Optionally used to write the recovered ledger data to directly. 1. Sync with ZK 2. Read from ZK to get all available bookie servers (only if bookieDest was not passed). 3. Read from ZK to get all active ledgers. 4. For each ledger, open it to obtain the LedgerHandle. 5. For each ledger fragment, see if the dead input bookieSrc is a part of the ensemble for it. 6. For each ledger fragment that needs to be recovered, find the entries that were actually stored on the bookieSrc. Since we stripe data across bookies in the ensemble, not all ledger entries in the fragment would have been stored on the bookieSrc. 7. For each of the ledger entries in the fragment that were stored on the bookieSrc, use the BookKeeper client to read it. Using the client will take care of choosing one of the other bookies where this data is available since the bookieSrc server is dead. 8. Choosing a new bookieDest (either passed in explicitly or chosen at random currently among the available bookie servers), write this ledger entry directly to there using the BookieClient (the layer that talks to the bookie servers directly). 9. Once all ledger entries for all fragments for a ledger have been recovered and re-replicated, update ZK so the ledger's metadata now points to the new bookie server where this data has been re-replicated to. We choose a single new bookie for each ledger to store the recovered data.
          Hide
          Hadoop QA added a comment -

          +1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12447283/ZOOKEEPER-712.patch
          against trunk revision 953041.

          +1 @author. The patch does not contain any @author tags.

          +1 tests included. The patch appears to include 3 new or modified tests.

          +1 javadoc. The javadoc tool did not generate any warning messages.

          +1 javac. The applied patch does not increase the total number of javac compiler warnings.

          +1 findbugs. The patch does not introduce any new Findbugs warnings.

          +1 release audit. The applied patch does not increase the total number of release audit warnings.

          +1 core tests. The patch passed core unit tests.

          +1 contrib tests. The patch passed contrib unit tests.

          Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/118/testReport/
          Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/118/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
          Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/118/console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12447283/ZOOKEEPER-712.patch against trunk revision 953041. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/118/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/118/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Zookeeper-Patch-h1.grid.sp2.yahoo.net/118/console This message is automatically generated.
          Hide
          Benjamin Reed added a comment -

          there are some hardcoded assumptions in the code that can be removed once ZOOKEEPER-807 is addressed.

          Show
          Benjamin Reed added a comment - there are some hardcoded assumptions in the code that can be removed once ZOOKEEPER-807 is addressed.
          Hide
          Benjamin Reed added a comment -

          +1 looks good. thanx erwin!

          Show
          Benjamin Reed added a comment - +1 looks good. thanx erwin!
          Hide
          Benjamin Reed added a comment -

          Committed revision 962697.

          Show
          Benjamin Reed added a comment - Committed revision 962697.
          Hide
          Hudson added a comment -

          Integrated in ZooKeeper-trunk #881 (See http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/881/)

          Show
          Hudson added a comment - Integrated in ZooKeeper-trunk #881 (See http://hudson.zones.apache.org/hudson/job/ZooKeeper-trunk/881/ )
          Hide
          Flavio Junqueira added a comment -

          Uploading a patch for 3.3.

          Show
          Flavio Junqueira added a comment - Uploading a patch for 3.3.
          Hide
          Mahadev konar added a comment -

          I just pushed this to 3.3 branch. thanks flavio!

          Show
          Mahadev konar added a comment - I just pushed this to 3.3 branch. thanks flavio!

            People

            • Assignee:
              Erwin Tam
              Reporter:
              Flavio Junqueira
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development