HBase
  1. HBase
  2. HBASE-1867

Tool to regenerate an hbase table from the data files

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 0.20.0
    • Fix Version/s: 0.20.2, 0.90.0
    • Component/s: util
    • Labels:
      None

      Description

      The purpose of this JIRA is provide a place to coordinate the development of a utility that will regenerate an hbase table from the data files.

      Here are some comments from stack on this subject from the hbase-user mailing list:

      Well, in the bin directory, there are scripts that do various things with
      the .META. (copy a table, move a table, load a table whose source is hfiles
      written by a mapreduce job; i.e. hbase-48).

      So, to 'regenerate an hbase table from the data files', you'd need to do
      something like the following:

      + delete all exisiting table references from .META.
      + move the backuped up table into position under hbase.rootdir
      + per region under hbase.rootdir, add an entry to .META. Do this by opening
      the .regioninfo file. Its content is needed to generate the rowid for
      .META. and its value becomes the info:regioninfo cell value.

      HBase does not need to be down. On next .META. scan, the newly added
      regions will be noticed. They won't have associated info:server and
      info:startcode entries so master will go ahead and assign them and you
      should be up and running.

      Code-wise, a study of copy_table.rb (this uses old api ... needs updating
      but the concepts are the same) and loadtable.rb would probably be fruitful.

        Issue Links

          Activity

          Hide
          Jon Graham added a comment -

          Thanks Elsif for creating this JIRA

          Show
          Jon Graham added a comment - Thanks Elsif for creating this JIRA
          Hide
          stack added a comment -

          So, for this script... what will we pass it? The location of a table in hdfs? This table could be under hbase.rootdir or elsewhere. If elsewhere, the script would also do the move into place?

          If there is already a table under /hbase of same name... script could move it aside.

          Show
          stack added a comment - So, for this script... what will we pass it? The location of a table in hdfs? This table could be under hbase.rootdir or elsewhere. If elsewhere, the script would also do the move into place? If there is already a table under /hbase of same name... script could move it aside.
          Hide
          elsif added a comment -

          The input arguments would be the hdfs path and optionally a new name for the table:

          regenerate_table.rb HDFS_URL [TABLE_NAME]

          If the table already exists the user would be prompted for instructions to move the table aside, remove it, or cancel the operation.

          Show
          elsif added a comment - The input arguments would be the hdfs path and optionally a new name for the table: regenerate_table.rb HDFS_URL [TABLE_NAME] If the table already exists the user would be prompted for instructions to move the table aside, remove it, or cancel the operation.
          Hide
          stack added a comment -

          Above is fine except the bit about users being prompted for instructions to move the table aside.. .how about we just move it aside and tell user we did it rather than do user interaction messing in script

          Show
          stack added a comment - Above is fine except the bit about users being prompted for instructions to move the table aside.. .how about we just move it aside and tell user we did it rather than do user interaction messing in script
          Hide
          stack added a comment -

          Here's a start. Reads in args., sets up filesystem, moves aside extant table directory, moves into place pointed to directory. Untested. Still a bunch to do. (If someone wants to take over, be my guest).

          Show
          stack added a comment - Here's a start. Reads in args., sets up filesystem, moves aside extant table directory, moves into place pointed to directory. Untested. Still a bunch to do. (If someone wants to take over, be my guest).
          Hide
          stack added a comment -

          Here's a first cut. Needs a bit of testing. Seems to work fine on small table of ten regions IFF the passed directory is a copied-aside table. Does not yet work with the passing of arbitrary table name (Needs a bit of messy reconstructing of HTableDescriptor with new name then renaming of region directory with recalcuation of region encoded name).

          Show
          stack added a comment - Here's a first cut. Needs a bit of testing. Seems to work fine on small table of ten regions IFF the passed directory is a copied-aside table. Does not yet work with the passing of arbitrary table name (Needs a bit of messy reconstructing of HTableDescriptor with new name then renaming of region directory with recalcuation of region encoded name).
          Hide
          Woosuk Suh added a comment -

          Your script worked perfectly with the HBase cluster with 4 running machines having 1011 regions.
          I used your script because our .META. region evaporated for some unknown reason.
          FYI, each server has 5GB memory and intel quad core CPU.

          But sometimes I was not able to run the script because the errors happened.
          So I`m going to give you all the process that I took to make the script work.

          1. Our table had the structure on HDFS like this.
          /hbase/TABLENAME

          2. So I moved hbase to hbase_backup
          /hbase_backup/TABLENAME

          3. And then started hbase so the broken .META. table would regenerate cleanly. After starting hbase,
          /hbase/
          /hbase_backup/TABLENAME

          4. And then I ran the script like this
          bin/hbase org.jruby.Main add_table.rb hdfs://our.server.addr:port/hbase_backup/TABLENAME
          Then I got the error from line 105, statuses were "nil" objects. Unable to iterate nil objects.

          5. I`m not familiar to ruby but python, so I think it was impossible to iterate through None objects.
          I printed the tableDir with LOG.info(tableDir.toString()) and I got following.
          our.server.addr:port/hbase_backup/TABLENAME

          6. So, I tried to copy the hbase_backup/TABLENAME to hbase/TABLENAME like following
          bin/hadoop dfs -cp hbase_backup/TABLENAME hbase/TABLENAME

          7. After a long time, copy process finished. And I tried to run the script again with following command.
          bin/hbase org.jruby.Main add_table.rb hdfs://our.server.addr:port/hbase/TABLENAME
          And it worked without any error or problem and all the regions were restored!

          I hope this usage information helps your code improved.
          Thanks for fabulous script!

          Show
          Woosuk Suh added a comment - Your script worked perfectly with the HBase cluster with 4 running machines having 1011 regions. I used your script because our .META. region evaporated for some unknown reason. FYI, each server has 5GB memory and intel quad core CPU. But sometimes I was not able to run the script because the errors happened. So I`m going to give you all the process that I took to make the script work. 1. Our table had the structure on HDFS like this. /hbase/TABLENAME 2. So I moved hbase to hbase_backup /hbase_backup/TABLENAME 3. And then started hbase so the broken .META. table would regenerate cleanly. After starting hbase, /hbase/ /hbase_backup/TABLENAME 4. And then I ran the script like this bin/hbase org.jruby.Main add_table.rb hdfs://our.server.addr:port/hbase_backup/TABLENAME Then I got the error from line 105, statuses were "nil" objects. Unable to iterate nil objects. 5. I`m not familiar to ruby but python, so I think it was impossible to iterate through None objects. I printed the tableDir with LOG.info(tableDir.toString()) and I got following. our.server.addr:port/hbase_backup/TABLENAME 6. So, I tried to copy the hbase_backup/TABLENAME to hbase/TABLENAME like following bin/hadoop dfs -cp hbase_backup/TABLENAME hbase/TABLENAME 7. After a long time, copy process finished. And I tried to run the script again with following command. bin/hbase org.jruby.Main add_table.rb hdfs://our.server.addr:port/hbase/TABLENAME And it worked without any error or problem and all the regions were restored! I hope this usage information helps your code improved. Thanks for fabulous script!
          Hide
          stack added a comment -

          @wooksuh Thanks for the report. Let me try with table in a different location. It must be something to do w/ qualified names in hdfs. Let me figure it. Also, can we figure what happened to your .META. table? This is 0.20.0 hbase? Enable DEBUG log level in case it happens again.

          Show
          stack added a comment - @wooksuh Thanks for the report. Let me try with table in a different location. It must be something to do w/ qualified names in hdfs. Let me figure it. Also, can we figure what happened to your .META. table? This is 0.20.0 hbase? Enable DEBUG log level in case it happens again.
          Hide
          stack added a comment -

          Fix the wooksuh issue for 0.20.1

          Show
          stack added a comment - Fix the wooksuh issue for 0.20.1
          Hide
          Woosuk Suh added a comment -

          @stack
          Yes, I`m using the 0.20.0 version of HBase with 0.20.0 version of Hadoop.
          I`m going to enable the DEBUG log level as I witnessed the .META. table problem several times.
          I`ll give my feedback when the problem happens to the mailing-list with the log attached.
          Thanks!

          Show
          Woosuk Suh added a comment - @stack Yes, I`m using the 0.20.0 version of HBase with 0.20.0 version of Hadoop. I`m going to enable the DEBUG log level as I witnessed the .META. table problem several times. I`ll give my feedback when the problem happens to the mailing-list with the log attached. Thanks!
          Hide
          stack added a comment -

          This should fix the issue you were seeing...

          Show
          stack added a comment - This should fix the issue you were seeing...
          Hide
          stack added a comment -

          This doesn't have to be in 0.20.1. We can point anyone who needs this script to this issue. Moving it out.

          Show
          stack added a comment - This doesn't have to be in 0.20.1. We can point anyone who needs this script to this issue. Moving it out.
          Hide
          Woosuk Suh added a comment -

          Cool! You are definitely thrilling me!
          I will test your fixed version when it`s possible and give here a feedback.

          I also trying to catch the .META. table problem, but that error no more happens for since last problem.
          What a very typical characteristic of bug.. When you need it happen, it never happens. When you don`t, it happens

          Show
          Woosuk Suh added a comment - Cool! You are definitely thrilling me! I will test your fixed version when it`s possible and give here a feedback. I also trying to catch the .META. table problem, but that error no more happens for since last problem. What a very typical characteristic of bug.. When you need it happen, it never happens. When you don`t, it happens
          Hide
          stack added a comment -

          Added this script. Its of general utility.

          Show
          stack added a comment - Added this script. Its of general utility.
          Hide
          Jarrod Cuzens added a comment -

          Just wanted to comment. I had run an alter on my HBase table and I think that this doesn't modify the .regioninfo files in each HRegion.

          The add_table.rb uses the .regioninfo files in order to try to rebuild the .META. table and if these aren't modified by an alter (even with a major_compact) it won't restore as you would expect.

          Show
          Jarrod Cuzens added a comment - Just wanted to comment. I had run an alter on my HBase table and I think that this doesn't modify the .regioninfo files in each HRegion. The add_table.rb uses the .regioninfo files in order to try to rebuild the .META. table and if these aren't modified by an alter (even with a major_compact) it won't restore as you would expect.
          Hide
          Jarrod Cuzens added a comment -

          Attaching this issue which describes the problem with altering a table and using add_table.rb.

          Show
          Jarrod Cuzens added a comment - Attaching this issue which describes the problem with altering a table and using add_table.rb.

            People

            • Assignee:
              stack
              Reporter:
              elsif
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development