HBase
  1. HBase
  2. HBASE-66

Add support for migrating between hbase versions

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.2.0
    • Component/s: util
    • Labels:
      None

      Description

      If Hbase is to be used to serve data to live systems we would need a way to upgrade both the underlying hadoop installation and hbase to newer versions with minimal downtime.

        Issue Links

          Activity

          Hide
          Bryan Duxbury added a comment -

          What do we think of creating Rails-esque migration scripts for each version change? Each migration script would have to know how to read the previous version's data and write the current version. Then we can add a version number to any files we store in HDFS to indicate what version it was created at.

          When you deploy a new version of HBase, you first need to run the migration script. Once the migration is completed successfully, you should be able to start the new version of HBase. Also, the migration script should prefer not overwriting the existing files, instead creating new ones. That way, we're less likely to make some mistake that causes the old data to get lost while the new data is being re-written.

          Show
          Bryan Duxbury added a comment - What do we think of creating Rails-esque migration scripts for each version change? Each migration script would have to know how to read the previous version's data and write the current version. Then we can add a version number to any files we store in HDFS to indicate what version it was created at. When you deploy a new version of HBase, you first need to run the migration script. Once the migration is completed successfully, you should be able to start the new version of HBase. Also, the migration script should prefer not overwriting the existing files, instead creating new ones. That way, we're less likely to make some mistake that causes the old data to get lost while the new data is being re-written.
          Hide
          stack added a comment -

          I like the rails idea. Migration should support going in both directions I'd say.

          hbase state is all kept out in the filesystem so hopefully, filesystem machinations should be all thats required making migrations.

          HStoreFiles are MapFiles + an info file stored in a sympathetic directory. This info file has little in it currently – just sequence id. Could also have hbase version. For log files, perhaps first record is stamp of the hbase version doing the writing.

          It occurred to me that migrations could entail significant rewriting of on-filesystem data. To distribute the migration, we could we could have the master and regionservers run the migrations. Each server on startup would look for any migrations to run and just run them if any found. Nice thing about this is that we'd get the migration job distributed. But thinking on it, probably better to have the migration done outside of hbase in its own dedicated MR job. Would be easier tracking failures and running reversals.

          Show
          stack added a comment - I like the rails idea. Migration should support going in both directions I'd say. hbase state is all kept out in the filesystem so hopefully, filesystem machinations should be all thats required making migrations. HStoreFiles are MapFiles + an info file stored in a sympathetic directory. This info file has little in it currently – just sequence id. Could also have hbase version. For log files, perhaps first record is stamp of the hbase version doing the writing. It occurred to me that migrations could entail significant rewriting of on-filesystem data. To distribute the migration, we could we could have the master and regionservers run the migrations. Each server on startup would look for any migrations to run and just run them if any found. Nice thing about this is that we'd get the migration job distributed. But thinking on it, probably better to have the migration done outside of hbase in its own dedicated MR job. Would be easier tracking failures and running reversals.
          Hide
          Bryan Duxbury added a comment -

          I would say -1 on going back in versions of hbase, because a reverse migration doesn't readily address how to deal with features and settings not available in prior versions. You'd have potential for loss of precision per se in terms of configuration.

          Also, I'm not sure why you'd ever want to go from a later version to an earlier version, unless you've mistakenly upgraded in the first place. I suggest that you take care and back up your old HDFS files before migrating instead.

          Show
          Bryan Duxbury added a comment - I would say -1 on going back in versions of hbase, because a reverse migration doesn't readily address how to deal with features and settings not available in prior versions. You'd have potential for loss of precision per se in terms of configuration. Also, I'm not sure why you'd ever want to go from a later version to an earlier version, unless you've mistakenly upgraded in the first place. I suggest that you take care and back up your old HDFS files before migrating instead.
          Hide
          stack added a comment -

          I ain't too invested in our supporting reverse migrations but its worth noting that any migration system worth its salt – systems I've worked on in the past and ruby on rails – go both ways if only to facilitate testing of the forward migration (inevitably there's a bug when you try to migrate real data).

          Show
          stack added a comment - I ain't too invested in our supporting reverse migrations but its worth noting that any migration system worth its salt – systems I've worked on in the past and ruby on rails – go both ways if only to facilitate testing of the forward migration (inevitably there's a bug when you try to migrate real data).
          Hide
          Jim Kellerman added a comment -

          stack wrote:
          > I ain't too invested in our supporting reverse migrations but its worth noting that any migration system worth its salt -
          > systems I've worked on in the past and ruby on rails - go both ways if only to facilitate testing of the forward migration
          > (inevitably there's a bug when you try to migrate real data).

          That's what backups are for

          More importantly though, HADOOP-2478 incorporates a migration tool. The specifics of what the tool does will have to be
          rewritten for each upgrade, but I think the framework is good.

          Show
          Jim Kellerman added a comment - stack wrote: > I ain't too invested in our supporting reverse migrations but its worth noting that any migration system worth its salt - > systems I've worked on in the past and ruby on rails - go both ways if only to facilitate testing of the forward migration > (inevitably there's a bug when you try to migrate real data). That's what backups are for More importantly though, HADOOP-2478 incorporates a migration tool. The specifics of what the tool does will have to be rewritten for each upgrade, but I think the framework is good.
          Hide
          Jim Kellerman added a comment -

          Marking this issue as being incorporated by HADOOP-2478. HADOOP-2478 has a migration too.

          Show
          Jim Kellerman added a comment - Marking this issue as being incorporated by HADOOP-2478 . HADOOP-2478 has a migration too.
          Hide
          stack added a comment -

          There is no framework that I can see in HADOOP-2478. The is just a single script that addresses a single migration incident.

          Show
          stack added a comment - There is no framework that I can see in HADOOP-2478 . The is just a single script that addresses a single migration incident.
          Hide
          stack added a comment -

          Chatting w/ Jim, we need to work out a design for how we see migrations working going forward. I made a start here http://wiki.apache.org/hadoop/Hbase/Migration.

          Show
          stack added a comment - Chatting w/ Jim, we need to work out a design for how we see migrations working going forward. I made a start here http://wiki.apache.org/hadoop/Hbase/Migration .
          Hide
          Bryan Duxbury added a comment -

          The plan on the wiki seems good, and implemented already, at least in part. Should we close this generic migration issue and in the future make new issues for each migration step we add? Right now this issue isn't really adding very much.

          Show
          Bryan Duxbury added a comment - The plan on the wiki seems good, and implemented already, at least in part. Should we close this generic migration issue and in the future make new issues for each migration step we add? Right now this issue isn't really adding very much.
          Hide
          stack added a comment -

          Agree w/ Bryan comment above. Resolving.

          Show
          stack added a comment - Agree w/ Bryan comment above. Resolving.

            People

            • Assignee:
              Unassigned
              Reporter:
              Johan Oskarsson
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development