HBase
  1. HBase
  2. HBASE-1621

merge tool should work on online cluster

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      taking down the entire cluster to merge 2 regions is a pain, i dont see why the table or regions specifically couldnt be taken offline, then merged then brought back up.

      this might need a new API to the regionservers so they can take direction from not just the master.

      1. online_merge.rb
        10 kB
        Jean-Daniel Cryans
      2. hbase-onlinemerge.patch
        19 kB
        Sebastian Bauer
      3. HBASE-1621-v2.patch
        17 kB
        Jean-Daniel Cryans
      4. HBASE-1621.patch
        17 kB
        Jean-Daniel Cryans
      5. 1621-trunk.txt
        19 kB
        Ted Yu

        Issue Links

          Activity

          Hide
          stack added a comment -

          Agreed.

          Show
          stack added a comment - Agreed.
          Hide
          stack added a comment -

          Move out of 0.20.1?

          Show
          stack added a comment - Move out of 0.20.1?
          Hide
          stack added a comment -

          Moving out of 0.20.1 after chatting with Ryan.

          Show
          stack added a comment - Moving out of 0.20.1 after chatting with Ryan.
          Hide
          stack added a comment -

          Or rather, we shouldn't have to take the table offline at all... just close regions to be merged, merge them while all else is online, then update meta.

          Show
          stack added a comment - Or rather, we shouldn't have to take the table offline at all... just close regions to be merged, merge them while all else is online, then update meta.
          Hide
          Jean-Daniel Cryans added a comment -

          Here's an initial implementation for online merging, along with a unit test. I tested it on a small table in a small cluster, worked as advertised. There's currently no rollbacking so if anything happens during the merge then the 2 regions will stay OFFLINE which means that you need to disable/enable the table to get them back.

          Based off branch.

          Show
          Jean-Daniel Cryans added a comment - Here's an initial implementation for online merging, along with a unit test. I tested it on a small table in a small cluster, worked as advertised. There's currently no rollbacking so if anything happens during the merge then the 2 regions will stay OFFLINE which means that you need to disable/enable the table to get them back. Based off branch.
          Hide
          Jean-Daniel Cryans added a comment -

          Patch that actually only merges the specified table.

          After fixing it I tried it on a cluster, shaved off around 350 regions on a big table in about 8 minutes.

          Show
          Jean-Daniel Cryans added a comment - Patch that actually only merges the specified table. After fixing it I tried it on a cluster, shaved off around 350 regions on a big table in about 8 minutes.
          Hide
          Jean-Daniel Cryans added a comment -

          This won't happen in 0.90 unless someone really needs it.

          Show
          Jean-Daniel Cryans added a comment - This won't happen in 0.90 unless someone really needs it.
          Hide
          Jonathan Gray added a comment -

          There is a slight chance that we will implement this with a bunch of HBCK improvements we are working on. In any case, I'm +1 on punting the jira.

          Show
          Jonathan Gray added a comment - There is a slight chance that we will implement this with a bunch of HBCK improvements we are working on. In any case, I'm +1 on punting the jira.
          Hide
          Daniel Einspanjer added a comment -

          Curious if the idea could be pushed even further and allow the merge to analyze the regions to find pairs to be merged, then specifically offline those regions and merge them rather than disabling the table..

          Show
          Daniel Einspanjer added a comment - Curious if the idea could be pushed even further and allow the merge to analyze the regions to find pairs to be merged, then specifically offline those regions and merge them rather than disabling the table..
          Hide
          Sebastian Bauer added a comment -

          new patch is working with trunk rev:1069743, all tests passed and later today i will test it in our cluster

          Show
          Sebastian Bauer added a comment - new patch is working with trunk rev:1069743, all tests passed and later today i will test it in our cluster
          Hide
          Sebastian Bauer added a comment -

          issu with new patch:

          11/02/11 12:48:52 DEBUG regionserver.HRegion: Opening region: REGION => {NAME => 'Golden_ATU,6863-g_75FAEC36055608836CA66DCF3F587EF5,1297363213791.25a54a0a31f43b31c3273ea8124febc3.', STARTKEY => '6863-g_75FAEC36055608836CA66DCF3F587EF5', ENDKEY => '6867-d_2010_8_8_23B88FA03D74F4A93374B12F60292B4F', ENCODED => 25a54a0a31f43b31c3273ea8124febc3, OFFLINE => true, TABLE => {{NAME => 'Golden_ATU', FAMILIES => [

          {NAME => 'c', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', COMPRESSION => 'LZO', VERSIONS => '1', TTL => '-1', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}

          ]}}
          11/02/11 12:48:52 DEBUG regionserver.HRegion: Instantiated Golden_ATU,6863-g_75FAEC36055608836CA66DCF3F587EF5,1297363213791.25a54a0a31f43b31c3273ea8124febc3.
          11/02/11 12:48:52 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
          11/02/11 12:48:52 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev /home/django/hadoop-lzo/src/get_build_revision.sh: line 4: git: command not found]
          11/02/11 12:48:52 INFO compress.CodecPool: Got brand-new compressor
          11/02/11 12:48:52 FATAL util.Merge: Merge failed
          java.lang.IllegalArgumentException: Wrong FS: hdfs://db2a:50001/hbase/Golden_ATU/25a54a0a31f43b31c3273ea8124febc3/.regioninfo, expected: file:///
          at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:310)
          at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:47)
          at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:357)
          at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
          at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:648)
          at org.apache.hadoop.hbase.regionserver.HRegion.checkRegioninfoOnFilesystem(HRegion.java:430)
          at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:355)
          at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2713)
          at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2699)
          at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2668)
          at org.apache.hadoop.hbase.util.OnlineMerge.merge(OnlineMerge.java:290)
          at org.apache.hadoop.hbase.util.OnlineMerge.mergeRegions(OnlineMerge.java:230)
          at org.apache.hadoop.hbase.util.OnlineMerge.run(OnlineMerge.java:118)
          at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
          at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
          at org.apache.hadoop.hbase.util.OnlineMerge.main(OnlineMerge.java:393)

          Show
          Sebastian Bauer added a comment - issu with new patch: 11/02/11 12:48:52 DEBUG regionserver.HRegion: Opening region: REGION => {NAME => 'Golden_ATU,6863-g_75FAEC36055608836CA66DCF3F587EF5,1297363213791.25a54a0a31f43b31c3273ea8124febc3.', STARTKEY => '6863-g_75FAEC36055608836CA66DCF3F587EF5', ENDKEY => '6867-d_2010_8_8_23B88FA03D74F4A93374B12F60292B4F', ENCODED => 25a54a0a31f43b31c3273ea8124febc3, OFFLINE => true, TABLE => {{NAME => 'Golden_ATU', FAMILIES => [ {NAME => 'c', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', COMPRESSION => 'LZO', VERSIONS => '1', TTL => '-1', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'} ]}} 11/02/11 12:48:52 DEBUG regionserver.HRegion: Instantiated Golden_ATU,6863-g_75FAEC36055608836CA66DCF3F587EF5,1297363213791.25a54a0a31f43b31c3273ea8124febc3. 11/02/11 12:48:52 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library 11/02/11 12:48:52 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev /home/django/hadoop-lzo/src/get_build_revision.sh: line 4: git: command not found] 11/02/11 12:48:52 INFO compress.CodecPool: Got brand-new compressor 11/02/11 12:48:52 FATAL util.Merge: Merge failed java.lang.IllegalArgumentException: Wrong FS: hdfs://db2a:50001/hbase/Golden_ATU/25a54a0a31f43b31c3273ea8124febc3/.regioninfo, expected: file:/// at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:310) at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:47) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:357) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:648) at org.apache.hadoop.hbase.regionserver.HRegion.checkRegioninfoOnFilesystem(HRegion.java:430) at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:355) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2713) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2699) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2668) at org.apache.hadoop.hbase.util.OnlineMerge.merge(OnlineMerge.java:290) at org.apache.hadoop.hbase.util.OnlineMerge.mergeRegions(OnlineMerge.java:230) at org.apache.hadoop.hbase.util.OnlineMerge.run(OnlineMerge.java:118) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.hbase.util.OnlineMerge.main(OnlineMerge.java:393)
          Hide
          Sebastian Bauer added a comment -

          last traceback was because onlinemerger was running from client and doesn't have hadoop configuration, but this is something to track down:

          11/02/11 13:56:52 DEBUG regionserver.Store: closed c
          11/02/11 13:56:52 INFO regionserver.HRegion: Closed Golden_ATU,6879-w_2010_27_6CC5E20C77D7EA4771A44B432B281BD7,1287089222938.6dac71a9736509ad4cbbfa3f58df2f41.
          11/02/11 13:56:52 DEBUG regionserver.HRegion: Closing Golden_ATU,6879-w_2010_28_383BE4774DBF96F224EEE3A41282E0D8,1287089224381.abb281eff02fdb5bc935d71a8dd7f27a.: disabling compactions & flushes
          11/02/11 13:56:52 DEBUG regionserver.HRegion: Updates disabled for region Golden_ATU,6879-w_2010_28_383BE4774DBF96F224EEE3A41282E0D8,1287089224381.abb281eff02fdb5bc935d71a8dd7f27a.
          11/02/11 13:56:52 DEBUG hfile.HFile: On close of file hdfs://db2a:50001/hbase/Golden_ATU/abb281eff02fdb5bc935d71a8dd7f27a/c/5936293646315145921 evicted 0 block(s) of 149 total blocks
          11/02/11 13:56:52 DEBUG regionserver.Store: closed c
          11/02/11 13:56:52 INFO regionserver.HRegion: Closed Golden_ATU,6879-w_2010_28_383BE4774DBF96F224EEE3A41282E0D8,1287089224381.abb281eff02fdb5bc935d71a8dd7f27a.
          11/02/11 13:56:52 FATAL util.Merge: Merge failed
          java.io.IOException: Files have same sequenceid: 1305035413
          at org.apache.hadoop.hbase.regionserver.HRegion.merge(HRegion.java:2952)
          at org.apache.hadoop.hbase.util.OnlineMerge.merge(OnlineMerge.java:294)
          at org.apache.hadoop.hbase.util.OnlineMerge.mergeRegions(OnlineMerge.java:230)
          at org.apache.hadoop.hbase.util.OnlineMerge.run(OnlineMerge.java:118)
          at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
          at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
          at org.apache.hadoop.hbase.util.OnlineMerge.main(OnlineMerge.java:393)
          11/02/11 13:56:52 DEBUG wal.HLog: main.logSyncer interrupted while waiting for sync requests
          11/02/11 13:56:52 INFO wal.HLog: main.logSyncer exiting
          11/02/11 13:56:52 DEBUG wal.HLog: closing hlog writer in hdfs://db2a:50001/user/hbase/.logs_1297429009717
          11/02/11 13:56:52 DEBUG wal.HLog: Moved 1 log files to /user/hbase/.oldlogs
          11/02/11 13:56:53 INFO util.Merge: Verifying that file system is available...
          11/02/11 13:56:53 INFO util.Merge: Verifying that HBase is running...
          11/02/11 13:56:53 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.3.2-1031432, built on 11/05/2010 05:32 GMT
          11/02/11 13:56:53 INFO zookeeper.ZooKeeper: Client environment:host.name=db2a.goldenline.pl

          Show
          Sebastian Bauer added a comment - last traceback was because onlinemerger was running from client and doesn't have hadoop configuration, but this is something to track down: 11/02/11 13:56:52 DEBUG regionserver.Store: closed c 11/02/11 13:56:52 INFO regionserver.HRegion: Closed Golden_ATU,6879-w_2010_27_6CC5E20C77D7EA4771A44B432B281BD7,1287089222938.6dac71a9736509ad4cbbfa3f58df2f41. 11/02/11 13:56:52 DEBUG regionserver.HRegion: Closing Golden_ATU,6879-w_2010_28_383BE4774DBF96F224EEE3A41282E0D8,1287089224381.abb281eff02fdb5bc935d71a8dd7f27a.: disabling compactions & flushes 11/02/11 13:56:52 DEBUG regionserver.HRegion: Updates disabled for region Golden_ATU,6879-w_2010_28_383BE4774DBF96F224EEE3A41282E0D8,1287089224381.abb281eff02fdb5bc935d71a8dd7f27a. 11/02/11 13:56:52 DEBUG hfile.HFile: On close of file hdfs://db2a:50001/hbase/Golden_ATU/abb281eff02fdb5bc935d71a8dd7f27a/c/5936293646315145921 evicted 0 block(s) of 149 total blocks 11/02/11 13:56:52 DEBUG regionserver.Store: closed c 11/02/11 13:56:52 INFO regionserver.HRegion: Closed Golden_ATU,6879-w_2010_28_383BE4774DBF96F224EEE3A41282E0D8,1287089224381.abb281eff02fdb5bc935d71a8dd7f27a. 11/02/11 13:56:52 FATAL util.Merge: Merge failed java.io.IOException: Files have same sequenceid: 1305035413 at org.apache.hadoop.hbase.regionserver.HRegion.merge(HRegion.java:2952) at org.apache.hadoop.hbase.util.OnlineMerge.merge(OnlineMerge.java:294) at org.apache.hadoop.hbase.util.OnlineMerge.mergeRegions(OnlineMerge.java:230) at org.apache.hadoop.hbase.util.OnlineMerge.run(OnlineMerge.java:118) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.hbase.util.OnlineMerge.main(OnlineMerge.java:393) 11/02/11 13:56:52 DEBUG wal.HLog: main.logSyncer interrupted while waiting for sync requests 11/02/11 13:56:52 INFO wal.HLog: main.logSyncer exiting 11/02/11 13:56:52 DEBUG wal.HLog: closing hlog writer in hdfs://db2a:50001/user/hbase/.logs_1297429009717 11/02/11 13:56:52 DEBUG wal.HLog: Moved 1 log files to /user/hbase/.oldlogs 11/02/11 13:56:53 INFO util.Merge: Verifying that file system is available... 11/02/11 13:56:53 INFO util.Merge: Verifying that HBase is running... 11/02/11 13:56:53 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.3.2-1031432, built on 11/05/2010 05:32 GMT 11/02/11 13:56:53 INFO zookeeper.ZooKeeper: Client environment:host.name=db2a.goldenline.pl
          Hide
          stack added a comment -

          Need to fix this issue before can do merges.

          Show
          stack added a comment - Need to fix this issue before can do merges.
          Hide
          stack added a comment -

          We need this in 0.92.

          Show
          stack added a comment - We need this in 0.92.
          Hide
          Ted Yu added a comment -

          Adapted onlinemerge to TRUNK.

          The unit test didn't seem to make progress and repeatedly printed:

          2011-07-06 16:50:56,131 DEBUG [main] client.HTable$ClientScanner(1103): Finished with scanning at REGION => {NAME => '.META.,,1 TableName => .META.', STARTKEY => '', ENDKEY => '', ENCODED => 1028785192,}
          2011-07-06 16:50:56,331 DEBUG [main] client.HTable$ClientScanner(1023): Creating scanner over .META. starting at key ''
          2011-07-06 16:50:56,331 DEBUG [main] client.HTable$ClientScanner(1116): Advancing internal scanner to startKey at ''
          2011-07-06 16:50:56,332 INFO  [main] util.OnlineMerge(265): mrtest,,1309996082762.1688f19bfa69d68a090aa8b90bfa1129.
          
          Show
          Ted Yu added a comment - Adapted onlinemerge to TRUNK. The unit test didn't seem to make progress and repeatedly printed: 2011-07-06 16:50:56,131 DEBUG [main] client.HTable$ClientScanner(1103): Finished with scanning at REGION => {NAME => '.META.,,1 TableName => .META.', STARTKEY => '', ENDKEY => '', ENCODED => 1028785192,} 2011-07-06 16:50:56,331 DEBUG [main] client.HTable$ClientScanner(1023): Creating scanner over .META. starting at key '' 2011-07-06 16:50:56,331 DEBUG [main] client.HTable$ClientScanner(1116): Advancing internal scanner to startKey at '' 2011-07-06 16:50:56,332 INFO [main] util.OnlineMerge(265): mrtest,,1309996082762.1688f19bfa69d68a090aa8b90bfa1129.
          Hide
          stack added a comment -

          Just ran into this trying to do some table surgery. Its silly. We need to fix.

          Show
          stack added a comment - Just ran into this trying to do some table surgery. Its silly. We need to fix.
          Hide
          stack added a comment -

          Pulling this back into 0.92 and assigning myself. This is a pretty critical one. I'd like to try do it for 0.92 (Thanks Ted for being more of a man than me moving the issues out 0.92.. I was unable to cut more).

          Show
          stack added a comment - Pulling this back into 0.92 and assigning myself. This is a pretty critical one. I'd like to try do it for 0.92 (Thanks Ted for being more of a man than me moving the issues out 0.92.. I was unable to cut more).
          Hide
          stack added a comment -

          Need this so can do improvements to hbck --fix (The improvements to hbck --fix) can happen out of band with 0.92 release but need this in place)

          Show
          stack added a comment - Need this so can do improvements to hbck --fix (The improvements to hbck --fix) can happen out of band with 0.92 release but need this in place)
          Hide
          Andy Sautins added a comment -

          Would it be possible for the implementation to not have to compact before and after the merge is done? Looking at HRegion.merge it starts by compacting stores on both regions and then at the end compacts the stores for the new merged region before completing. Would it be possible to just merge all the store files and let the next major compaction on the new region handle store compaction? It would be nice to keep the amount of time the table needs to be disabled to a minimum. If one or both of the regions being merged is large it could take a meaningful amount of time to complete the merge.

          Show
          Andy Sautins added a comment - Would it be possible for the implementation to not have to compact before and after the merge is done? Looking at HRegion.merge it starts by compacting stores on both regions and then at the end compacts the stores for the new merged region before completing. Would it be possible to just merge all the store files and let the next major compaction on the new region handle store compaction? It would be nice to keep the amount of time the table needs to be disabled to a minimum. If one or both of the regions being merged is large it could take a meaningful amount of time to complete the merge.
          Hide
          stack added a comment -

          I was thinking of an implementation that just ensured that the regions to merge were first all closed out on the cluster and then it'd create the new merge region and just move the storefiles from the old regions into place under the new merge region – then deploy. No writing of new store files. Let any compactions, etc., happen post assign.

          Show
          stack added a comment - I was thinking of an implementation that just ensured that the regions to merge were first all closed out on the cluster and then it'd create the new merge region and just move the storefiles from the old regions into place under the new merge region – then deploy. No writing of new store files. Let any compactions, etc., happen post assign.
          Hide
          Jean-Daniel Cryans added a comment -

          This is my current WIP on an online merging script.

          It requires you download this file and put it on the ruby path (on put it in the folder you are when running online_merge): http://gitorious.org/trollop/mainline/blobs/raw/master/lib/trollop.rb

          (learn more about trollop here: http://trollop.rubyforge.org/)

          Currently it:

          • merges whole table or a range of regions
          • skips the regions that are splitting or just split
          • merges any number of regions together, default being 5
          • does a dry run by default

          Work needed:

          • a ton of refactoring
          • a log to be able to continue a merge if anything weird happened
          • more smarts on handling splits

          We removed thousands of regions already on our clusters with this script (or some older and buggier version of it).

          Show
          Jean-Daniel Cryans added a comment - This is my current WIP on an online merging script. It requires you download this file and put it on the ruby path (on put it in the folder you are when running online_merge): http://gitorious.org/trollop/mainline/blobs/raw/master/lib/trollop.rb (learn more about trollop here: http://trollop.rubyforge.org/ ) Currently it: merges whole table or a range of regions skips the regions that are splitting or just split merges any number of regions together, default being 5 does a dry run by default Work needed: a ton of refactoring a log to be able to continue a merge if anything weird happened more smarts on handling splits We removed thousands of regions already on our clusters with this script (or some older and buggier version of it).
          Hide
          Jean-Daniel Cryans added a comment -

          New version of the script, this time without the errors included by some last minute changes.

          Show
          Jean-Daniel Cryans added a comment - New version of the script, this time without the errors included by some last minute changes.
          Hide
          gaojinchao added a comment -

          @J-D

          Will the java api be supported? We want to use these api to do some automatic monitoring.
          If can't do it. We will convert your code.

          Show
          gaojinchao added a comment - @J-D Will the java api be supported? We want to use these api to do some automatic monitoring. If can't do it. We will convert your code.
          Hide
          Daniel Einspanjer added a comment -

          @J-D

          I noticed a potential small problem with the script while reading through it.
          At the beginning, it calls HBaseAdmin.balanceSwitch(false), but after the initial scan, there are two Trollop:die statements that would prevent the balancer from being turned back on.

          I think that you need to test the two break conditions in an if block, and then turn the balancer back on before you call die.

          Show
          Daniel Einspanjer added a comment - @J-D I noticed a potential small problem with the script while reading through it. At the beginning, it calls HBaseAdmin.balanceSwitch(false), but after the initial scan, there are two Trollop:die statements that would prevent the balancer from being turned back on. I think that you need to test the two break conditions in an if block, and then turn the balancer back on before you call die.
          Hide
          Jean-Daniel Cryans added a comment -

          Will the java api be supported?

          I'm not sure I understand the question, but the reason a script was built in the first place (instead of java code) is to be able to include it in a 0.90 release without breaking compatibility, while offering development flexibility. It could eventually be included in hbck.

          Show
          Jean-Daniel Cryans added a comment - Will the java api be supported? I'm not sure I understand the question, but the reason a script was built in the first place (instead of java code) is to be able to include it in a 0.90 release without breaking compatibility, while offering development flexibility. It could eventually be included in hbck.
          Hide
          Jean-Daniel Cryans added a comment -

          I think that you need to test the two break conditions in an if block, and then turn the balancer back on before you call die.

          True, also it needs to handle the case where the balancer was already switched off and shouldn't be turned back on at the end.

          Show
          Jean-Daniel Cryans added a comment - I think that you need to test the two break conditions in an if block, and then turn the balancer back on before you call die. True, also it needs to handle the case where the balancer was already switched off and shouldn't be turned back on at the end.
          Hide
          stack added a comment -

          So I think the fact that we have this script to do online merge takes the heat off this issue. I do not think it a blocker on 0.92 anymore.

          Show
          stack added a comment - So I think the fact that we have this script to do online merge takes the heat off this issue. I do not think it a blocker on 0.92 anymore.
          Hide
          stack added a comment -

          Undoing this as blocker now we have a merge script that has been run a few times in production; having such script takes the heat off the need for this... but we still need it. Marking major.

          Show
          stack added a comment - Undoing this as blocker now we have a merge script that has been run a few times in production; having such script takes the heat off the need for this... but we still need it. Marking major.
          Hide
          Jimmy Hu added a comment -

          I have run this command in cdh3u0 version, and got the following error:

          online_merge.rb.bak:112: undefined method `each' for #<#<Class:01x4eb3c24f>:0x7b
          99f8e6> (NoMethodError).
          line 112 is
          for result in resultScanner

          the hbase version is 0.90.1 . Does this script only work with 0.92.0 hbase version ?

          Show
          Jimmy Hu added a comment - I have run this command in cdh3u0 version, and got the following error: online_merge.rb.bak:112: undefined method `each' for #<#<Class:01x4eb3c24f>:0x7b 99f8e6> (NoMethodError). line 112 is for result in resultScanner the hbase version is 0.90.1 . Does this script only work with 0.92.0 hbase version ?
          Hide
          Jonathan Gray added a comment -

          Punt to 0.92.1 or 0.94.0?

          Show
          Jonathan Gray added a comment - Punt to 0.92.1 or 0.94.0?
          Hide
          Ted Yu added a comment -

          This feature can be implemented in 0.94

          Show
          Ted Yu added a comment - This feature can be implemented in 0.94
          Hide
          Jonathan Hsieh added a comment -

          I've written some merging code in HBASE-5128 but having some problems with closing regions and regions in transition. Will likely borrow code from here.

          Show
          Jonathan Hsieh added a comment - I've written some merging code in HBASE-5128 but having some problems with closing regions and regions in transition. Will likely borrow code from here.
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK-security #81 (See https://builds.apache.org/job/HBase-TRUNK-security/81/)
          [book] book.xml, ops_mgt.xml additional clarification that compaction does not do region merging
          also, added link to script in HBASE-1621 in Ops Mgt chapter.

          dmeil :
          Files :

          • /hbase/trunk/src/docbkx/book.xml
          • /hbase/trunk/src/docbkx/ops_mgt.xml
          Show
          Hudson added a comment - Integrated in HBase-TRUNK-security #81 (See https://builds.apache.org/job/HBase-TRUNK-security/81/ ) [book] book.xml, ops_mgt.xml additional clarification that compaction does not do region merging also, added link to script in HBASE-1621 in Ops Mgt chapter. dmeil : Files : /hbase/trunk/src/docbkx/book.xml /hbase/trunk/src/docbkx/ops_mgt.xml
          Hide
          Hudson added a comment -

          Integrated in HBase-TRUNK #2639 (See https://builds.apache.org/job/HBase-TRUNK/2639/)
          [book] book.xml, ops_mgt.xml additional clarification that compaction does not do region merging
          also, added link to script in HBASE-1621 in Ops Mgt chapter.

          dmeil :
          Files :

          • /hbase/trunk/src/docbkx/book.xml
          • /hbase/trunk/src/docbkx/ops_mgt.xml
          Show
          Hudson added a comment - Integrated in HBase-TRUNK #2639 (See https://builds.apache.org/job/HBase-TRUNK/2639/ ) [book] book.xml, ops_mgt.xml additional clarification that compaction does not do region merging also, added link to script in HBASE-1621 in Ops Mgt chapter. dmeil : Files : /hbase/trunk/src/docbkx/book.xml /hbase/trunk/src/docbkx/ops_mgt.xml
          Hide
          Alexey Zotov added a comment -

          We use merge_online.rb script. It has a small bug.

          When we start script in dry-run mode (without "-n" option) it creates empty directories in HDFS:

          0           hdfs://<HOSTNAME>/hbase/<TABLE_NAME>/0d5a0784d3ce2450ce1936efe3d0232a
          1975581     hdfs://<HOSTNAME>/hbase/<TABLE_NAME>/2005c8cfd13b48bda6fb7faaacaf8cd4
          0           hdfs://<HOSTNAME>/hbase/<TABLE_NAME>/45b75d9c4cc1bbef6dec0326a8bc42a1
          1941609     hdfs://<HOSTNAME>/hbase/<TABLE_NAME>/66c227828b9404762cea8f7d8890957c
          0           hdfs://<HOSTNAME>/hbase/<TABLE_NAME>/7cc710a1ef01dd5c4805bb8b4ee9f79b
          0           hdfs://<HOSTNAME>/hbase/<TABLE_NAME>/a0f510e86fcc69fbacae68536dd496ae
          1905306     hdfs://<HOSTNAME>/hbase/<TABLE_NAME>/dc9ca1251fd4f1a48392b37c7a7c2fcc
          1971976     hdfs://<HOSTNAME>/hbase/<TABLE_NAME>/f4bfae7e508ca852f54806d954f49029
          

          We've created the patch:

          --- online_merge.rb	2012-02-01 00:11:25.000000000 -0800
          +++ online_merge_fixed.rb	2012-02-01 00:11:14.000000000 -0800
          @@ -266,7 +266,7 @@
           
             # For each family, move all the files one by one
             for family in tableDesc.getFamiliesKeys
          -    HRegion.makeColumnFamilyDirs(fs, tableDir, newHRI, family)
          +    HRegion.makeColumnFamilyDirs(fs, tableDir, newHRI, family) if normalRun
               newFamilyPath = Store.getStoreHomedir(tableDir, newHRI.getEncodedName, family)
               
               for row in toMerge
          
          Show
          Alexey Zotov added a comment - We use merge_online.rb script. It has a small bug. When we start script in dry-run mode (without "-n" option) it creates empty directories in HDFS: 0 hdfs: //<HOSTNAME>/hbase/<TABLE_NAME>/0d5a0784d3ce2450ce1936efe3d0232a 1975581 hdfs: //<HOSTNAME>/hbase/<TABLE_NAME>/2005c8cfd13b48bda6fb7faaacaf8cd4 0 hdfs: //<HOSTNAME>/hbase/<TABLE_NAME>/45b75d9c4cc1bbef6dec0326a8bc42a1 1941609 hdfs: //<HOSTNAME>/hbase/<TABLE_NAME>/66c227828b9404762cea8f7d8890957c 0 hdfs: //<HOSTNAME>/hbase/<TABLE_NAME>/7cc710a1ef01dd5c4805bb8b4ee9f79b 0 hdfs: //<HOSTNAME>/hbase/<TABLE_NAME>/a0f510e86fcc69fbacae68536dd496ae 1905306 hdfs: //<HOSTNAME>/hbase/<TABLE_NAME>/dc9ca1251fd4f1a48392b37c7a7c2fcc 1971976 hdfs: //<HOSTNAME>/hbase/<TABLE_NAME>/f4bfae7e508ca852f54806d954f49029 We've created the patch: --- online_merge.rb 2012-02-01 00:11:25.000000000 -0800 +++ online_merge_fixed.rb 2012-02-01 00:11:14.000000000 -0800 @@ -266,7 +266,7 @@ # For each family, move all the files one by one for family in tableDesc.getFamiliesKeys - HRegion.makeColumnFamilyDirs(fs, tableDir, newHRI, family) + HRegion.makeColumnFamilyDirs(fs, tableDir, newHRI, family) if normalRun newFamilyPath = Store.getStoreHomedir(tableDir, newHRI.getEncodedName, family) for row in toMerge
          Hide
          stack added a comment -

          Would you mind uploading a version of online_merge.rb w/ patch included Alexey to make it easier for others using the script. Thank you.

          Show
          stack added a comment - Would you mind uploading a version of online_merge.rb w/ patch included Alexey to make it easier for others using the script. Thank you.
          Hide
          Lars Hofhansl added a comment -

          Moving out of 0.94

          Show
          Lars Hofhansl added a comment - Moving out of 0.94
          Hide
          stack added a comment -

          Making critical though we'll probably end up punting it.

          Show
          stack added a comment - Making critical though we'll probably end up punting it.
          Hide
          Ted Yu added a comment -

          Marking this duplicate with HBASE-7403.
          This feature is desirable for 0.96 but shouldn't block 0.96

          Show
          Ted Yu added a comment - Marking this duplicate with HBASE-7403 . This feature is desirable for 0.96 but shouldn't block 0.96

            People

            • Assignee:
              stack
              Reporter:
              ryan rawson
            • Votes:
              9 Vote for this issue
              Watchers:
              27 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development