Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: documentation, regionserver
    • Labels:
      None

      Description

      Basic idea behind the backup architecture for HBase

      1. HBase Backups Architecture.docx
        95 kB
        Karthik Ranganathan
      2. HBase Backups Architecture v2.docx
        97 kB
        Karthik Ranganathan

        Issue Links

          Activity

          Hide
          Karthik Ranganathan added a comment -

          I think we should add this doc to the HBase book. The code parts of this HBase backups feature is already done. I think the next step is to implement a simple wrapper script, and document that as well.

          The tasks are already created, see HBASE-4618 for a list of sub-tasks (tasks 1, 2, 4 and 6 are done, 4 needs to be checked in and closed out).

          The next one to look at would be HBASE-4664. Let me add some comments in there about what we came up with internally, and then we can go ahead from there.

          Show
          Karthik Ranganathan added a comment - I think we should add this doc to the HBase book. The code parts of this HBase backups feature is already done. I think the next step is to implement a simple wrapper script, and document that as well. The tasks are already created, see HBASE-4618 for a list of sub-tasks (tasks 1, 2, 4 and 6 are done, 4 needs to be checked in and closed out). The next one to look at would be HBASE-4664 . Let me add some comments in there about what we came up with internally, and then we can go ahead from there.
          Hide
          stack added a comment -

          What should we do w/ this doc Karthik? Seems like still stuff to build out? Should we make issues for whats to be done?

          Show
          stack added a comment - What should we do w/ this doc Karthik? Seems like still stuff to build out? Should we make issues for whats to be done?
          Hide
          Karthik Ranganathan added a comment -

          Marking as resolved, feel free to send more comments my way in case something is not clear.

          Show
          Karthik Ranganathan added a comment - Marking as resolved, feel free to send more comments my way in case something is not clear.
          Hide
          Karthik Ranganathan added a comment -

          Made modifications as suggested here, also made certain explanation clearer. Also added a notes/FAQ section based on some questions I have received both here and via email.

          Show
          Karthik Ranganathan added a comment - Made modifications as suggested here, also made certain explanation clearer. Also added a notes/FAQ section based on some questions I have received both here and via email.
          Hide
          Karthik Ranganathan added a comment -

          @Doug:

          << list all the regions, for each region, ask the RS hosting it for a list of HFiles >>

          There is already an API to get a list of regions and the regionservers hosting them. And we added a new API to the RS to list the HFiles for the regions it hosts.

          << The strategy is great, but it will generate a flurry of (warranted) questions on how the average person does it. >>

          True - but this task is only to make sure the document is easy to read and understand by an average user. We can definitely add more details if needed, but that would risk confusing people. I will definitely incorporate the other suggestions (confusing names, etc). The rest of the tasks deal with giving a way for the average users to do backups by running/cron-ing a command and not have to deal with the internals of how it works.

          Show
          Karthik Ranganathan added a comment - @Doug: << list all the regions, for each region, ask the RS hosting it for a list of HFiles >> There is already an API to get a list of regions and the regionservers hosting them. And we added a new API to the RS to list the HFiles for the regions it hosts. << The strategy is great, but it will generate a flurry of (warranted) questions on how the average person does it. >> True - but this task is only to make sure the document is easy to read and understand by an average user. We can definitely add more details if needed, but that would risk confusing people. I will definitely incorporate the other suggestions (confusing names, etc). The rest of the tasks deal with giving a way for the average users to do backups by running/cron-ing a command and not have to deal with the internals of how it works.
          Hide
          Doug Meil added a comment -

          Hi folks, sorry about the delay in commenting.

          I liked the refresher on "why backup?" in the beginning.

          I also found some of the names confusing (e.g., RBU, CBU).

          The strategy here in the doc is terrific, but I'd like to see this get a little more "actionable" with specifics. For example in the Stage1 RBU incremental, "list all the regions, for each region, ask the RS hosting it for a list of HFiles". How is this to be done? Using Java-API to list regions? Reading the HBase files from HDFS? Ostensibly the RS hosting the region has to come from an online API. The strategy is great, but it will generate a flurry of (warranted) questions on how the average person does it.

          Show
          Doug Meil added a comment - Hi folks, sorry about the delay in commenting. I liked the refresher on "why backup?" in the beginning. I also found some of the names confusing (e.g., RBU, CBU). The strategy here in the doc is terrific, but I'd like to see this get a little more "actionable" with specifics. For example in the Stage1 RBU incremental, "list all the regions, for each region, ask the RS hosting it for a list of HFiles". How is this to be done? Using Java-API to list regions? Reading the HBase files from HDFS? Ostensibly the RS hosting the region has to come from an online API. The strategy is great, but it will generate a flurry of (warranted) questions on how the average person does it.
          Hide
          stack added a comment -

          Sounds good Karthik.

          Show
          stack added a comment - Sounds good Karthik.
          Hide
          Karthik Ranganathan added a comment -

          << For '...incremental backups at the Stage 1 (RBU) level', won't the time between step between b and d be 'large' and during the copy time, the list of files could change on you; i.e. when you go to copy a file, it maybe have been removed because it'd been compacted. What do you do in this case? (Your list may not included the compacted file)? >>

          We look for the deleted files in .Trash and reclaim. If they are not present, we fail the backup for the region. The backup job runs in loops - the first loop starts out with all regions. The failed regions are output and the second loop works only on the failed regions. The number of loops is configurable - we have defaulted at 5.

          << For "a.The backups rely on the clocks across the various region-servers for determining the point in time to which the edits are re-played", so, say a server is lagging the others by a good bit? When replaying the edits, you'd replay edits from when this lagging server said the backup began? >>

          No, right now we just subtract a configurable amount of time (say 5 mins) to the start time of the MR job to keep things simple. We could totally do what you say as an enhancement.

          << How will you know which hlogs to replay? You'll open it and look at first and last edits in the file? Or should we write out metadata files for hlogs? Or is it enough relying on hdfs modtime? >>

          The hlog files are of the format hlog.TIMESTAMP, TIMESTAMP is time when log is created. We look at this time to determine the file set. We need all files where TIMESTAMP > start time and TIMESTAMP < finish time. We need the latest file where TIMESTAMP < start time.

          Show
          Karthik Ranganathan added a comment - << For '...incremental backups at the Stage 1 (RBU) level', won't the time between step between b and d be 'large' and during the copy time, the list of files could change on you; i.e. when you go to copy a file, it maybe have been removed because it'd been compacted. What do you do in this case? (Your list may not included the compacted file)? >> We look for the deleted files in .Trash and reclaim. If they are not present, we fail the backup for the region. The backup job runs in loops - the first loop starts out with all regions. The failed regions are output and the second loop works only on the failed regions. The number of loops is configurable - we have defaulted at 5. << For "a.The backups rely on the clocks across the various region-servers for determining the point in time to which the edits are re-played", so, say a server is lagging the others by a good bit? When replaying the edits, you'd replay edits from when this lagging server said the backup began? >> No, right now we just subtract a configurable amount of time (say 5 mins) to the start time of the MR job to keep things simple. We could totally do what you say as an enhancement. << How will you know which hlogs to replay? You'll open it and look at first and last edits in the file? Or should we write out metadata files for hlogs? Or is it enough relying on hdfs modtime? >> The hlog files are of the format hlog.TIMESTAMP, TIMESTAMP is time when log is created. We look at this time to determine the file set. We need all files where TIMESTAMP > start time and TIMESTAMP < finish time. We need the latest file where TIMESTAMP < start time.
          Hide
          stack added a comment -

          Echo Todd #1 remarks.

          For '...incremental backups at the Stage 1 (RBU) level', won't the time between step between b and d be 'large' and during the copy time, the list of files could change on you; i.e. when you go to copy a file, it maybe have been removed because it'd been compacted. What do you do in this case? (Your list may not included the compacted file)?

          For "a.The backups rely on the clocks across the various region-servers for determining the point in time to which the edits are re-played", so, say a server is lagging the others by a good bit? When replaying the edits, you'd replay edits from when this lagging server said the backup began?

          How will you know which hlogs to replay? You'll open it and look at first and last edits in the file? Or should we write out metadata files for hlogs? Or is it enough relying on hdfs modtime?

          Looks great K.

          Show
          stack added a comment - Echo Todd #1 remarks. For '...incremental backups at the Stage 1 (RBU) level', won't the time between step between b and d be 'large' and during the copy time, the list of files could change on you; i.e. when you go to copy a file, it maybe have been removed because it'd been compacted. What do you do in this case? (Your list may not included the compacted file)? For "a.The backups rely on the clocks across the various region-servers for determining the point in time to which the edits are re-played", so, say a server is lagging the others by a good bit? When replaying the edits, you'd replay edits from when this lagging server said the backup began? How will you know which hlogs to replay? You'll open it and look at first and last edits in the file? Or should we write out metadata files for hlogs? Or is it enough relying on hdfs modtime? Looks great K.
          Hide
          Karthik Ranganathan added a comment -

          For #1, totally internally, we use the term "cluster" to denote a section of the data center (as opposed to the HBase cluster), a data center is composed of a number of "clusters", hence the name. in-DC and cross-DC work.

          For #2, this makes the running cluster stall and not take updates for the time period of the copy. It is fast-copy with hard-links underneath, but there is nothing in the current design that would stop it from being used against a remote cluster or a DFS version without the hard-link. Also, if for some reason the hard link fails, it does a deep copy, so it could have longer stalls.

          Show
          Karthik Ranganathan added a comment - For #1, totally internally, we use the term "cluster" to denote a section of the data center (as opposed to the HBase cluster), a data center is composed of a number of "clusters", hence the name. in-DC and cross-DC work. For #2, this makes the running cluster stall and not take updates for the time period of the copy. It is fast-copy with hard-links underneath, but there is nothing in the current design that would stop it from being used against a remote cluster or a DFS version without the hard-link. Also, if for some reason the hard link fails, it does a deep copy, so it could have longer stalls.
          Hide
          Todd Lipcon added a comment -

          Two quick notes from looking over the doc:

          • the names are a little confusing to me - "in-cluster back up" is actually two clusters, right? I'd call your "RBU" an in-cluster backup, I'd call your CBU an "in-datacenter backup", and I'd call your DBU a "cross-datacenter backup", "DR backup", or "BCP backup".
          • For RBU, maybe we can get atomicity in a simpler manner by having the region server initiate the copy of hfiles? It can hold the lock to block flushes while the copies happen (they're hard-link copies, right?)
          Show
          Todd Lipcon added a comment - Two quick notes from looking over the doc: the names are a little confusing to me - "in-cluster back up" is actually two clusters, right? I'd call your "RBU" an in-cluster backup, I'd call your CBU an "in-datacenter backup", and I'd call your DBU a "cross-datacenter backup", "DR backup", or "BCP backup". For RBU, maybe we can get atomicity in a simpler manner by having the region server initiate the copy of hfiles? It can hold the lock to block flushes while the copies happen (they're hard-link copies, right?)
          Hide
          Karthik Ranganathan added a comment -

          Sounds great Doug! Maybe we make a new section, keep adding stuff in, and deprecate the old stuff? Or whatever works...

          Show
          Karthik Ranganathan added a comment - Sounds great Doug! Maybe we make a new section, keep adding stuff in, and deprecate the old stuff? Or whatever works...
          Hide
          Doug Meil added a comment -

          I'll gladly port this to the book, and I'd like to add this in here...
          http://hbase.apache.org/book.html#ops.backup
          ... with the existing backup info.

          Show
          Doug Meil added a comment - I'll gladly port this to the book, and I'd like to add this in here... http://hbase.apache.org/book.html#ops.backup ... with the existing backup info.
          Hide
          Karthik Ranganathan added a comment -

          Basic HBase backup architecture and the various levels of protection it would offer.

          Show
          Karthik Ranganathan added a comment - Basic HBase backup architecture and the various levels of protection it would offer.

            People

            • Assignee:
              Karthik Ranganathan
              Reporter:
              Karthik Ranganathan
            • Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development