HBase
  1. HBase
  2. HBASE-4618 HBase backups
  3. HBASE-5509

MR based copier for copying HFiles (trunk version)

    Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: documentation, regionserver
    • Labels:
      None

      Description

      This copier is a modification of the distcp tool in HDFS. It does the following:

      1. List out all the regions in the HBase cluster for the required table
      2. Write the above out to a file
      3. Each mapper
      3.1 lists all the HFiles for a given region by querying the regionserver
      3.2 copies all the HFiles
      3.3 outputs success if the copy succeeded, failure otherwise. Failed regions are retried in another loop
      4. Mappers are placed on nodes which have maximum locality for a given region to speed up copying

      1. 5509.txt
        53 kB
        Lars Hofhansl
      2. 5509-v2.txt
        54 kB
        Lars Hofhansl

        Issue Links

          Activity

          Hide
          Lars Hofhansl added a comment -

          This is essentially the patch I got from Karthik.

          I made the following changes:
          1. API changes for trunk.
          2. Also copies and patches .tableinfo files as needed.
          3. ImportTable can proceed even while HBase is down (HDFS needs to be up obviously). Regions will assigned upon HBase startup.
          4. A ZK quorum can be optionally provided to assign imported regions right away (needed in trunk).
          5. Some safety checks to verify the backup directory provided for import is in fact a backup directory (otherwise in some case adjacent directories were overwritten with temporary files).

          Show
          Lars Hofhansl added a comment - This is essentially the patch I got from Karthik. I made the following changes: 1. API changes for trunk. 2. Also copies and patches .tableinfo files as needed. 3. ImportTable can proceed even while HBase is down (HDFS needs to be up obviously). Regions will assigned upon HBase startup. 4. A ZK quorum can be optionally provided to assign imported regions right away (needed in trunk). 5. Some safety checks to verify the backup directory provided for import is in fact a backup directory (otherwise in some case adjacent directories were overwritten with temporary files).
          Hide
          Lars Hofhansl added a comment -

          This is a work on progress!

          Show
          Lars Hofhansl added a comment - This is a work on progress!
          Hide
          Ted Yu added a comment -

          The following already exists in FSTableDescriptors.java:

          +  public static boolean isTableInfoExists(FileSystem fs, Path tabledir)
          

          Can the patch be refreshed based on current TRUNK ?

          Show
          Ted Yu added a comment - The following already exists in FSTableDescriptors.java: + public static boolean isTableInfoExists(FileSystem fs, Path tabledir) Can the patch be refreshed based on current TRUNK ?
          Hide
          Lars Hofhansl added a comment -

          @Ted: This is against trunk.

          Show
          Lars Hofhansl added a comment - @Ted: This is against trunk.
          Hide
          Ted Yu added a comment -

          Right.
          The existing isTableInfoExists() has a different signature.

          Pardon me.

          Show
          Ted Yu added a comment - Right. The existing isTableInfoExists() has a different signature. Pardon me.
          Hide
          Ted Yu added a comment - - edited

          SnapshotUtilities.java misses license and javadoc for the class.

          +  public static boolean sameFile(FileSystem srcfs, FileStatus srcstatus,
          +      FileSystem dstfs, Path dstpath, boolean skipCRCCheck) throws IOException {
          

          Is it possible to make the src and dst comply to same data type ? Either FileStatus or Path.

          For sameFile(), I think false should be returned for dest file in the following case:

          +      //return true if checksum is not supported
          +      //(i.e. some of the checksums is null)
          
          +  public static Path getPathInTrash(Path path, String hbaseUser,
          +      FileSystem srcFileSys) throws IOException {
          

          I think FileSystem parameter should be placed as first parameter for the above method.

          +    String trashPrefix = "/user/" + hbaseUser + "/.Trash";
          

          I think the name of trash folder should be made configurable.

          For getStoreFileList():

          +   * @param families
          +   *          a comma separated list of column families for which we need to
          

          I think List<String> may be better data type for families parameter. This would make this method more general in that it is not tied to the format of user input.

          +    long retryTimeInMins =
          +      conf.getInt("hbase.backups.region.retryTimeInMins", 5) * 60 * 1000L;
          

          Please rename the above variable which is converted to millis unit.

          SnapshotMR.java misses license.

          Show
          Ted Yu added a comment - - edited SnapshotUtilities.java misses license and javadoc for the class. + public static boolean sameFile(FileSystem srcfs, FileStatus srcstatus, + FileSystem dstfs, Path dstpath, boolean skipCRCCheck) throws IOException { Is it possible to make the src and dst comply to same data type ? Either FileStatus or Path. For sameFile(), I think false should be returned for dest file in the following case: + // return true if checksum is not supported + //(i.e. some of the checksums is null ) + public static Path getPathInTrash(Path path, String hbaseUser, + FileSystem srcFileSys) throws IOException { I think FileSystem parameter should be placed as first parameter for the above method. + String trashPrefix = "/user/" + hbaseUser + "/.Trash" ; I think the name of trash folder should be made configurable. For getStoreFileList(): + * @param families + * a comma separated list of column families for which we need to I think List<String> may be better data type for families parameter. This would make this method more general in that it is not tied to the format of user input. + long retryTimeInMins = + conf.getInt( "hbase.backups.region.retryTimeInMins" , 5) * 60 * 1000L; Please rename the above variable which is converted to millis unit. SnapshotMR.java misses license.
          Hide
          Jesse Yates added a comment -

          think it might be time to RB this bad boy; I've got a bunch of comments of my own.

          Show
          Jesse Yates added a comment - think it might be time to RB this bad boy; I've got a bunch of comments of my own.
          Hide
          Lars Hofhansl added a comment -

          It's not ready for RB. Note that this is the Facebook patch ported to trunk with the changes I mentioned. All points Ted mentioned are from the FB patch.

          The types of comment I am looking for are: 1. do we want to this route at all 2. general comments on failure scenarios.
          Then I can go and clean up the finer points.

          Show
          Lars Hofhansl added a comment - It's not ready for RB. Note that this is the Facebook patch ported to trunk with the changes I mentioned. All points Ted mentioned are from the FB patch. The types of comment I am looking for are: 1. do we want to this route at all 2. general comments on failure scenarios. Then I can go and clean up the finer points.
          Hide
          Ted Yu added a comment -

          I think Karthick may tell us something about the failure scenarios they have handled through this approach

          SnapshotMR.FileReporter's run() method only sleeps. What purpose does FileReporter serve ?

          +   * Map method. Copies one file from source file system to destination.
          

          The above is inaccurate: every file returned from SnapshotUtilities.getStoreFileList() is copied.

          Show
          Ted Yu added a comment - I think Karthick may tell us something about the failure scenarios they have handled through this approach SnapshotMR.FileReporter's run() method only sleeps. What purpose does FileReporter serve ? + * Map method. Copies one file from source file system to destination. The above is inaccurate: every file returned from SnapshotUtilities.getStoreFileList() is copied.
          Hide
          Lars Hofhansl added a comment -

          FileReporter seems like a left-over that was never used. Should be removed.
          I'm going to post another patch soon with some cleanups.

          The command options need to be documented better. In fact the argument parsing should be improved too.

          Generally: It o.a.h.h.backups the right place to put this? Do we want this in core HBase?

          Show
          Lars Hofhansl added a comment - FileReporter seems like a left-over that was never used. Should be removed. I'm going to post another patch soon with some cleanups. The command options need to be documented better. In fact the argument parsing should be improved too. Generally: It o.a.h.h.backups the right place to put this? Do we want this in core HBase?
          Hide
          Jesse Yates added a comment -

          The command options need to be documented better. In fact the argument parsing should be improved too.

          +1

          Generally: It o.a.h.h.backups the right place to put this? Do we want this in core HBase?

          IMO, it should definitely be part of core. Think about the most common DBs, backup/snapshot is part of the database, as opposed to some other tool that you get from somewhere else. We can always break the paradigm, but it seems to fit in this case.

          1. do we want to this route at all

          I think this approach is pretty reasonable. To get 'real' snapshotting, we will obviously have to do a bit more work, but this is the right approach to get there. Ideally, I should just be able to hook up the region files to another cluster and be able to recover/rollback to the previous state. This seems the safest and fastest, though debatable how much of either and if its worth the work at the moment.

          Show
          Jesse Yates added a comment - The command options need to be documented better. In fact the argument parsing should be improved too. +1 Generally: It o.a.h.h.backups the right place to put this? Do we want this in core HBase? IMO, it should definitely be part of core. Think about the most common DBs, backup/snapshot is part of the database, as opposed to some other tool that you get from somewhere else. We can always break the paradigm, but it seems to fit in this case. 1. do we want to this route at all I think this approach is pretty reasonable. To get 'real' snapshotting, we will obviously have to do a bit more work, but this is the right approach to get there. Ideally, I should just be able to hook up the region files to another cluster and be able to recover/rollback to the previous state. This seems the safest and fastest, though debatable how much of either and if its worth the work at the moment.
          Hide
          Lars Hofhansl added a comment -

          For sameFile(), I think false should be returned for dest file in the following case:

          +      //return true if checksum is not supported
          +      //(i.e. some of the checksums is null)
          

          Not sure I agree with that. The method comment says that if either FS does not support checksums and that check is ignored. I.e. if any of the earlier tests (like size comparison) did not flag the files as different they are considered equal).

          Show
          Lars Hofhansl added a comment - For sameFile(), I think false should be returned for dest file in the following case: + // return true if checksum is not supported + //(i.e. some of the checksums is null ) Not sure I agree with that. The method comment says that if either FS does not support checksums and that check is ignored. I.e. if any of the earlier tests (like size comparison) did not flag the files as different they are considered equal).
          Hide
          Lars Hofhansl added a comment -

          Is it possible to make the src and dst comply to same data type ? Either FileStatus or Path.

          It is. It means the code in SnapshotMR would be slightly less readable. On the other hand we'd do fewer RPC to get the file system status. Also requires sameFile to be package private, otherwise we need to double check the file here. I'll do that and then we can decide.

          Show
          Lars Hofhansl added a comment - Is it possible to make the src and dst comply to same data type ? Either FileStatus or Path. It is. It means the code in SnapshotMR would be slightly less readable. On the other hand we'd do fewer RPC to get the file system status. Also requires sameFile to be package private, otherwise we need to double check the file here. I'll do that and then we can decide.
          Hide
          Lars Hofhansl added a comment -

          For getStoreFileList():

          +   * @param families
          +   *          a comma separated list of column families for which we need to
          
          I think List<String> may be better data type for families parameter. This would make this method more general in that

          Families is taken from the command line; it might be (1) an indicator for all CFs, or (2) a list of specific CFs. In the former case we cannot get the list of CFs before we know the RegionServer, which only happens later.

          Again, this is not my code. I can refactor and have extra command line flags and extra code for this.

          Show
          Lars Hofhansl added a comment - For getStoreFileList(): + * @param families + * a comma separated list of column families for which we need to I think List<String> may be better data type for families parameter. This would make this method more general in that Families is taken from the command line; it might be (1) an indicator for all CFs, or (2) a list of specific CFs. In the former case we cannot get the list of CFs before we know the RegionServer, which only happens later. Again, this is not my code. I can refactor and have extra command line flags and extra code for this.
          Hide
          Lars Hofhansl added a comment -

          Addressed some of the format issues raised.

          Another thought: It would be great if files outputted by SnapshotMR could be used with LoadIncrementalHFiles.

          Show
          Lars Hofhansl added a comment - Addressed some of the format issues raised. Another thought: It would be great if files outputted by SnapshotMR could be used with LoadIncrementalHFiles.
          Hide
          Karthik Ranganathan added a comment -

          @Zhihong Yu:
          We use this code as the primary means to backup HFiles inside FB. We have done a lot of improvements to the DFS copy underneath, and they have caused some bugs, but thats unrelated to this code. Not too many issues, besides tuning the number of mappers to use so that we dont overwhelm a running system.

          @Lars:
          You are correct about getStoreFileList() - it is passed from commandline and it is overloaded for a subset/all CF's. Zhihong - the list versus a comma-separated string is a trivial point since the list construction has to happen either in the RS or in the caller, so should not make much of a difference practically.

          Show
          Karthik Ranganathan added a comment - @Zhihong Yu: We use this code as the primary means to backup HFiles inside FB. We have done a lot of improvements to the DFS copy underneath, and they have caused some bugs, but thats unrelated to this code. Not too many issues, besides tuning the number of mappers to use so that we dont overwhelm a running system. @Lars: You are correct about getStoreFileList() - it is passed from commandline and it is overloaded for a subset/all CF's. Zhihong - the list versus a comma-separated string is a trivial point since the list construction has to happen either in the RS or in the caller, so should not make much of a difference practically.
          Hide
          Ted Yu added a comment -

          I agree about the point w.r.t. getStoreFileList()

          Show
          Ted Yu added a comment - I agree about the point w.r.t. getStoreFileList()
          Hide
          Lars Hofhansl added a comment - - edited

          Created https://reviews.apache.org/r/4218/ for better commenting.
          Let's get this thing into trunk. And maybe 0.94

          Show
          Lars Hofhansl added a comment - - edited Created https://reviews.apache.org/r/4218/ for better commenting. Let's get this thing into trunk. And maybe 0.94
          Hide
          Lars Hofhansl added a comment -

          Any takers for a review?
          I assume +1 from the FB guys (right Karthik?)

          Show
          Lars Hofhansl added a comment - Any takers for a review? I assume +1 from the FB guys (right Karthik?)
          Hide
          Jesse Yates added a comment -

          Looking at it today - should have my comments in by COB.

          Show
          Jesse Yates added a comment - Looking at it today - should have my comments in by COB.
          Hide
          Ted Yu added a comment -

          I put a few comments on RB.

          Show
          Ted Yu added a comment - I put a few comments on RB.
          Hide
          stack added a comment -

          I added some.

          Show
          stack added a comment - I added some.
          Hide
          Lars Hofhansl added a comment -

          Ran a first few tests on a table with 16 regions and two column families. (small cluster with 6 datanodes/regionservers)
          Snapshot took about two times as long as distcp of the same data (which surprised me).
          Interestingly distcp used 48 mappers while Snapshot used 16 mappers.

          Show
          Lars Hofhansl added a comment - Ran a first few tests on a table with 16 regions and two column families. (small cluster with 6 datanodes/regionservers) Snapshot took about two times as long as distcp of the same data (which surprised me). Interestingly distcp used 48 mappers while Snapshot used 16 mappers.
          Hide
          Lars Hofhansl added a comment -

          I am not happy with this. I think other features (like hardlinks) are missing to make this scheme viable.
          Unscheduling.

          Show
          Lars Hofhansl added a comment - I am not happy with this. I think other features (like hardlinks) are missing to make this scheme viable. Unscheduling.
          Hide
          Karthik Ranganathan added a comment -

          @Lars - I ripped out some code which used the hardlinking - we have implemented it internally. I believe we are planning on opensourcing this, otherwise you'd have to wait for native hardlinks. The current copy approach still works though for a few tens of TB's.

          Show
          Karthik Ranganathan added a comment - @Lars - I ripped out some code which used the hardlinking - we have implemented it internally. I believe we are planning on opensourcing this, otherwise you'd have to wait for native hardlinks. The current copy approach still works though for a few tens of TB's.
          Hide
          Lars Hofhansl added a comment -

          otherwise you'd have to wait for native hardlinks

          Looks like that is not going to happen (HDFS-3370).

          Show
          Lars Hofhansl added a comment - otherwise you'd have to wait for native hardlinks Looks like that is not going to happen ( HDFS-3370 ).
          Hide
          Karthik Ranganathan added a comment -

          I know but I dont get the reason though. Going to put in a couple of comments more, but if its a no go - then oh well.

          Show
          Karthik Ranganathan added a comment - I know but I dont get the reason though. Going to put in a couple of comments more, but if its a no go - then oh well.
          Hide
          Lars Hofhansl added a comment -

          Note needed, now that we have table snapshots.

          Show
          Lars Hofhansl added a comment - Note needed, now that we have table snapshots.

            People

            • Assignee:
              Lars Hofhansl
              Reporter:
              Karthik Ranganathan
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development