HBase
  1. HBase
  2. HBASE-8259

Snapshot backport in 0.94.6 breaks rolling restarts

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.94.6
    • Fix Version/s: 0.94.6.1, 0.94.7
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      Aleksandr Shulman found with his nifty QA tools that 0.94.6 has an incompatible change due to HBASE-7360 (Snapshot 0.94 Backport) that breaks rolling restarts.

      RegionTransitionData.write() uses eventType.ordinal() that is the index in the enum and not the value specified in the enum definition. It means we can't add new states in the middle of the list. This can be fixed by moving C_M_SNAPSHOT_TABLE and C_M_RESTORE_SNAPSHOT at the end of the list. Trunk does the right thing already.

      Right now, RIT znodes created with 0.94.6 (or top of the branch) will use the wrong value for the event type. You will see things like:

      2013-04-03 14:57:25,197 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x13dd1e10dbd0004 Attempting to transition node 70236052/-ROOT- from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING
      2013-04-03 14:57:25,201 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:60020-0x13dd1e10dbd0004 Attempt to transition the unassigned node for 70236052 from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING failed, the node existed but was in the state M_SERVER_SHUTDOWN set by the server 192.168.1.112,60020,1365026237977
      2013-04-03 14:57:25,201 WARN org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed transition from OFFLINE to OPENING for region=70236052
      2013-04-03 14:57:25,201 WARN org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Region was hijacked? It no longer exists, encodedName=70236052
      

      We should roll a 0.94.6.1 or 0.94.7 as soon this is fixed IMO.

      1. HBASE-8259-v0.patch
        3 kB
        Matteo Bertozzi

        Activity

        Hide
        Matteo Bertozzi added a comment -

        the patch moves the snapshots events to the end, and changes the wrong comment about the enum compatibility.

        Show
        Matteo Bertozzi added a comment - the patch moves the snapshots events to the end, and changes the wrong comment about the enum compatibility.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12576864/HBASE-8259-v0.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        -1 patch. The patch command could not apply the patch.

        Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/5118//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall . Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12576864/HBASE-8259-v0.patch against trunk revision . +1 @author . The patch does not contain any @author tags. -1 tests included . The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch . The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/5118//console This message is automatically generated.
        Hide
        Ted Yu added a comment - - edited

        Nice catch.

        +    // and not the value specified in the enum definition. so we can't add stuff in the middle.
        +    C_M_SNAPSHOT_TABLE        (48),   // Client asking Master to snapshot an offline table
        +    C_M_RESTORE_SNAPSHOT      (49);   // Client asking Master to snapshot an offline table
        

        Should the numeric values for the above two enum's be increased after the movement ?

        I understand the numeric values (48, 49 currently) are not used. The question is just for readability.

        Show
        Ted Yu added a comment - - edited Nice catch. + // and not the value specified in the enum definition. so we can't add stuff in the middle. + C_M_SNAPSHOT_TABLE (48), // Client asking Master to snapshot an offline table + C_M_RESTORE_SNAPSHOT (49); // Client asking Master to snapshot an offline table Should the numeric values for the above two enum's be increased after the movement ? I understand the numeric values (48, 49 currently) are not used. The question is just for readability.
        Hide
        Matteo Bertozzi added a comment -

        Should the numeric value for the above two enum's be increased after the movement ?

        It doesn't really matter, the numeric value is never used...

        RegionTransitionData.write() uses eventType.ordinal() that is the index in the enum and not the value specified in the enum definition.

        but I like the...

           * <p>We give the enums indices so we can add types later and keep them
           * grouped together rather than have to add them always to the end as we
           * would have to if we used raw enum ordinals.
        
        Show
        Matteo Bertozzi added a comment - Should the numeric value for the above two enum's be increased after the movement ? It doesn't really matter, the numeric value is never used... RegionTransitionData.write() uses eventType.ordinal() that is the index in the enum and not the value specified in the enum definition. but I like the... * <p>We give the enums indices so we can add types later and keep them * grouped together rather than have to add them always to the end as we * would have to if we used raw enum ordinals.
        Hide
        Jean-Daniel Cryans added a comment -

        but I like the...

        FWIW that code in trunk doesn't change so whatever we do in 0.94 doesn't really matter. I'd personally keep the numbers the same just because it's like in trunk.

        I tested the patch on a 0.94.6 master with a 0.94.5 region server and it works. +1

        Show
        Jean-Daniel Cryans added a comment - but I like the... FWIW that code in trunk doesn't change so whatever we do in 0.94 doesn't really matter. I'd personally keep the numbers the same just because it's like in trunk. I tested the patch on a 0.94.6 master with a 0.94.5 region server and it works. +1
        Hide
        Lars Hofhansl added a comment -

        Argghh. +1 on patch.

        Show
        Lars Hofhansl added a comment - Argghh. +1 on patch.
        Hide
        Lars Hofhansl added a comment -

        This makes me want to release 0.94.7 soon.

        Show
        Lars Hofhansl added a comment - This makes me want to release 0.94.7 soon.
        Hide
        Jean-Daniel Cryans added a comment -

        Lars Hofhansl that's why I'm proposing doing a 0.94.6.1 with just this patch.

        Show
        Jean-Daniel Cryans added a comment - Lars Hofhansl that's why I'm proposing doing a 0.94.6.1 with just this patch.
        Hide
        Hudson added a comment -

        Integrated in HBase-0.94 #940 (See https://builds.apache.org/job/HBase-0.94/940/)
        HBASE-8259 Snapshot backport in 0.94.6 breaks rolling restarts (Revision 1464562)

        Result = FAILURE
        mbertozzi :
        Files :

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java
        Show
        Hudson added a comment - Integrated in HBase-0.94 #940 (See https://builds.apache.org/job/HBase-0.94/940/ ) HBASE-8259 Snapshot backport in 0.94.6 breaks rolling restarts (Revision 1464562) Result = FAILURE mbertozzi : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java
        Hide
        Lars Hofhansl added a comment -

        I can do a 0.94.6.1 with just this patch. Would just branch of the 0.94.6 branch. Comments?

        Show
        Lars Hofhansl added a comment - I can do a 0.94.6.1 with just this patch. Would just branch of the 0.94.6 branch. Comments?
        Hide
        Ted Yu added a comment -

        What level of verification effort are you expecting ?
        If normal procedure of validating a release is involved, 0.94.7 release seems to be a better fit.

        Show
        Ted Yu added a comment - What level of verification effort are you expecting ? If normal procedure of validating a release is involved, 0.94.7 release seems to be a better fit.
        Hide
        Jean-Daniel Cryans added a comment -

        Yep +1.

        Show
        Jean-Daniel Cryans added a comment - Yep +1.
        Hide
        Andrew Purtell added a comment -

        I can do a 0.94.6.1 with just this patch. Would just branch of the 0.94.6 branch. Comments?

        +1

        Show
        Andrew Purtell added a comment - I can do a 0.94.6.1 with just this patch. Would just branch of the 0.94.6 branch. Comments? +1
        Hide
        Lars Hofhansl added a comment -

        Ted Yu None other than running the full test-suite, as this is the already released 0.94.6 branch + only this patch (which J-D validated).

        That is of course another option: Move all current 0.94.7 issues to 0.94.8 and release an RC for 0.94.7 today. I can do that too. That would take at least a week to be validated, if no issues are found.

        +0.5 for 0.94.6.1. It would show our commitment to quality (I know that sounds like a marketing slogan)

        Show
        Lars Hofhansl added a comment - Ted Yu None other than running the full test-suite, as this is the already released 0.94.6 branch + only this patch (which J-D validated). That is of course another option: Move all current 0.94.7 issues to 0.94.8 and release an RC for 0.94.7 today. I can do that too. That would take at least a week to be validated, if no issues are found. +0.5 for 0.94.6.1. It would show our commitment to quality (I know that sounds like a marketing slogan)
        Hide
        Lars Hofhansl added a comment -

        stack Any comments?

        I'll make a 0.94.6.1 branch now, apply the patch, and run the test-suite. If we decide otherwise I'll just delete the branch, and we'll do a 0.94.7RC today.

        Show
        Lars Hofhansl added a comment - stack Any comments? I'll make a 0.94.6.1 branch now, apply the patch, and run the test-suite. If we decide otherwise I'll just delete the branch, and we'll do a 0.94.7RC today.
        Hide
        stack added a comment -

        I think 0.94.6.1 is a little silly given the nice cadence you have going Lars Hofhansl Why not just do 0.94.7?

        Show
        stack added a comment - I think 0.94.6.1 is a little silly given the nice cadence you have going Lars Hofhansl Why not just do 0.94.7?
        Hide
        Lars Hofhansl added a comment -

        OK... Let me clean up the 0.94.7 issues, and I'll do an RC today.

        Show
        Lars Hofhansl added a comment - OK... Let me clean up the 0.94.7 issues, and I'll do an RC today.
        Hide
        Lars Hofhansl added a comment -

        Arrghh... Let's wait out discussion on DEV list.

        Show
        Lars Hofhansl added a comment - Arrghh... Let's wait out discussion on DEV list.
        Hide
        Lars Hofhansl added a comment -

        Matteo committed to 0.94.7, I committed to 0.94.6.1, resolving.

        Show
        Lars Hofhansl added a comment - Matteo committed to 0.94.7, I committed to 0.94.6.1, resolving.
        Hide
        Hudson added a comment -

        Integrated in HBase-0.94-security #130 (See https://builds.apache.org/job/HBase-0.94-security/130/)
        HBASE-8259 Snapshot backport in 0.94.6 breaks rolling restarts (Revision 1464562)

        Result = FAILURE
        mbertozzi :
        Files :

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java
        Show
        Hudson added a comment - Integrated in HBase-0.94-security #130 (See https://builds.apache.org/job/HBase-0.94-security/130/ ) HBASE-8259 Snapshot backport in 0.94.6 breaks rolling restarts (Revision 1464562) Result = FAILURE mbertozzi : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java
        Hide
        Hudson added a comment -

        Integrated in HBase-0.94-security-on-Hadoop-23 #13 (See https://builds.apache.org/job/HBase-0.94-security-on-Hadoop-23/13/)
        HBASE-8259 Snapshot backport in 0.94.6 breaks rolling restarts (Revision 1464562)

        Result = FAILURE
        mbertozzi :
        Files :

        • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java
        Show
        Hudson added a comment - Integrated in HBase-0.94-security-on-Hadoop-23 #13 (See https://builds.apache.org/job/HBase-0.94-security-on-Hadoop-23/13/ ) HBASE-8259 Snapshot backport in 0.94.6 breaks rolling restarts (Revision 1464562) Result = FAILURE mbertozzi : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/executor/EventHandler.java
        Hide
        Fengdong Yu added a comment -

        All mirror sites are not updated.

        Does there has command to trigger mirror sites update?

        Show
        Fengdong Yu added a comment - All mirror sites are not updated. Does there has command to trigger mirror sites update?
        Hide
        Lars Hofhansl added a comment -

        It takes time, they should all have the files by tomorrow. I only waited until the canonical repositories were updated.

        Show
        Lars Hofhansl added a comment - It takes time, they should all have the files by tomorrow. I only waited until the canonical repositories were updated.
        Hide
        Fengdong Yu added a comment -

        Lars,
        you can download it from backup sites, they are all updated.
        http://www.eu.apache.org/dist/hbase/
        http://www.us.apache.org/dist/hbase/

        Show
        Fengdong Yu added a comment - Lars, you can download it from backup sites, they are all updated. http://www.eu.apache.org/dist/hbase/ http://www.us.apache.org/dist/hbase/
        Hide
        Lars Hofhansl added a comment -

        Yep, these are what I meant with "canonical repositories"

        Show
        Lars Hofhansl added a comment - Yep, these are what I meant with "canonical repositories"

          People

          • Assignee:
            Matteo Bertozzi
            Reporter:
            Jean-Daniel Cryans
          • Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development