HBase
  1. HBase
  2. HBASE-4643

Consider reverting HBASE-451 (change HRI to remove HTD) in 0.92

    Details

    • Type: Brainstorming Brainstorming
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 0.92.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      I've been chatting with some folks recently about this thought: it seems like, if you enumerate the larger changes in 0.92, this is probably the one that is the most destabilizing that hasn't been through a lot of "baking" yet. You can see this evidenced by the very high number of followup commits it generated: looks like somewhere around 15 of them, plus some bugs still open.

      I've done a patch to revert this and the related followup changes on the 0.92 branch. Do we want to consider doing this?

      1. revert.txt
        470 kB
        Todd Lipcon

        Activity

        Hide
        stack added a comment -

        Resolving no longer pertinent issue

        Show
        stack added a comment - Resolving no longer pertinent issue
        Hide
        Todd Lipcon added a comment -

        Jonathan: you have clusters on 0.92 with real data? you're an adventurous guy!

        Show
        Todd Lipcon added a comment - Jonathan: you have clusters on 0.92 with real data? you're an adventurous guy!
        Hide
        Jonathan Gray added a comment -

        I've had a few pretty horrible experiences moving an 0.90 cluster to 0.92 so far, so I agree that this is definitely the most unbaked part of 0.92.

        Now that I've got 92 clusters, I'm going to have to figure out a reverting plan for them if we back this out now. It will also become a barrier between 0.92 and 0.94 which will make my life difficult as well (since we have been pulling 94 changes into a local 92 branch).

        I'd like to see if Stack's next changes do the trick before abandoning this.

        Show
        Jonathan Gray added a comment - I've had a few pretty horrible experiences moving an 0.90 cluster to 0.92 so far, so I agree that this is definitely the most unbaked part of 0.92. Now that I've got 92 clusters, I'm going to have to figure out a reverting plan for them if we back this out now. It will also become a barrier between 0.92 and 0.94 which will make my life difficult as well (since we have been pulling 94 changes into a local 92 branch). I'd like to see if Stack's next changes do the trick before abandoning this.
        Hide
        stack added a comment -

        I should have a patch for HBASE-4388 up later today. Addresses most of hbase-4389 too.

        Show
        stack added a comment - I should have a patch for HBASE-4388 up later today. Addresses most of hbase-4389 too.
        Hide
        Ted Yu added a comment -

        So HBASE-4389 would be abandoned ?
        I think more effort should be put on that JIRA.

        Show
        Ted Yu added a comment - So HBASE-4389 would be abandoned ? I think more effort should be put on that JIRA.
        Hide
        Todd Lipcon added a comment -

        Here's a patch which does the revert. It passed all but three unit tests:

        Results :

        Failed tests: testSizes(org.apache.hadoop.hbase.io.TestHeapSize): expected:<296> but was:<288>

        Tests in error:
        testMasterFailoverWithMockedRITOnDeadRS(org.apache.hadoop.hbase.master.TestMasterFailover): Server not running, aborting
        testTimestamps(org.apache.hadoop.hbase.TestMultiVersions): org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException: Column family colfamily1 does not exist in region testTimestamps,,1319176310979.39600bf02dc843a3dc6bf8b79567d8c7. in table {NAME => 'testTimestamps', FAMILIES => [

        {NAME => 'colfamily11', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', MIN_VERSIONS => '0', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}

        ]}

        It would need more work to be committed, but figured I'd post it just in case. It also reverted HBASE-3446 to make it simpler, so that woudl have to be re-incorporated.

        Show
        Todd Lipcon added a comment - Here's a patch which does the revert. It passed all but three unit tests: Results : Failed tests: testSizes(org.apache.hadoop.hbase.io.TestHeapSize): expected:<296> but was:<288> Tests in error: testMasterFailoverWithMockedRITOnDeadRS(org.apache.hadoop.hbase.master.TestMasterFailover): Server not running, aborting testTimestamps(org.apache.hadoop.hbase.TestMultiVersions): org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException: Column family colfamily1 does not exist in region testTimestamps,,1319176310979.39600bf02dc843a3dc6bf8b79567d8c7. in table {NAME => 'testTimestamps', FAMILIES => [ {NAME => 'colfamily11', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', MIN_VERSIONS => '0', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'} ]} It would need more work to be committed, but figured I'd post it just in case. It also reverted HBASE-3446 to make it simpler, so that woudl have to be re-incorporated.
        Hide
        Todd Lipcon added a comment -

        One major issue HBASE-451 solves is that too much heap is consumed on the master if the number of regions in the cluster is high.

        That's really really easy to solve with a simple Interning strategy. See http://guava-libraries.googlecode.com/svn/trunk/javadoc/com/google/common/collect/Interners.html#newWeakInterner()

        It's not so much the particular bugs of upgrade - it's just that this whole thing seems to have been sloppy. The number of bugs in upgrade is just one example.

        Show
        Todd Lipcon added a comment - One major issue HBASE-451 solves is that too much heap is consumed on the master if the number of regions in the cluster is high. That's really really easy to solve with a simple Interning strategy. See http://guava-libraries.googlecode.com/svn/trunk/javadoc/com/google/common/collect/Interners.html#newWeakInterner( ) It's not so much the particular bugs of upgrade - it's just that this whole thing seems to have been sloppy. The number of bugs in upgrade is just one example.
        Hide
        Ted Yu added a comment -

        Have we proven that no patch from HBASE-4388 works ?
        Stack is half way through composing the test in HBASE-4388 which would allow replaying upgrade scenarios.

        One major issue HBASE-451 solves is that too much heap is consumed on the master if the number of regions in the cluster is high.

        If we really cannot get upgrade done (the chance is slim), we have the following alternative:
        1. backup old /hbase hdfs tree
        2. boot up 0.92 from clean hdfs
        3. copy data prepared by step 1 to the same old location
        3. use tool from HBASE-4377 to regenerate .META. table

        Show
        Ted Yu added a comment - Have we proven that no patch from HBASE-4388 works ? Stack is half way through composing the test in HBASE-4388 which would allow replaying upgrade scenarios. One major issue HBASE-451 solves is that too much heap is consumed on the master if the number of regions in the cluster is high. If we really cannot get upgrade done (the chance is slim), we have the following alternative: 1. backup old /hbase hdfs tree 2. boot up 0.92 from clean hdfs 3. copy data prepared by step 1 to the same old location 3. use tool from HBASE-4377 to regenerate .META. table

          People

          • Assignee:
            Unassigned
            Reporter:
            Todd Lipcon
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development