HBase
  1. HBase
  2. HBASE-6537

Race between balancer and disable table can lead to inconsistent cluster

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.94.0
    • Fix Version/s: 0.94.2
    • Component/s: master
    • Labels:
      None

      Description

      Appear in 94. trunk is ok for the issue
      Balancer will collect the regionplans to move(unassign and then assign).
      before unassign, disable table appears,
      after close the region in rs, master will delete the znode, romove region from RIT,
      and then clean the region from the online regions.

      During romoving region from RIT and cleaning out the region from the online regions.
      balancer begins to unassign, it will get a NotServingRegionException and if the table is disabling, it will deal with the state in master and delete the znode . However the table is disabled now, so the RIT and znode will remain. TimeoutMonitor draws a blank on it.

      It will hold back enabling the table or balancer unless restart

      1. HBASE-6537-94-v2.patch
        1 kB
        zhou wenjian
      2. HBASE-6537-trunk-v2.patch
        2 kB
        Zhou wenjian
      3. HBASE-6537-trunk.patch
        2 kB
        Zhou wenjian

        Issue Links

          Activity

          Hide
          rajeshbabu added a comment -

          @Zhou wenjian
          clearing from online regions before removing from RIT may not solve the problem completely.
          Its better to check whether table is in disabling and disabled also in case of NotServingRegionException.

          May be this is applicable for trunk also because there also we are checking table is in disabling only.
          Please correct me if i am wrong.

          Show
          rajeshbabu added a comment - @Zhou wenjian clearing from online regions before removing from RIT may not solve the problem completely. Its better to check whether table is in disabling and disabled also in case of NotServingRegionException. May be this is applicable for trunk also because there also we are checking table is in disabling only. Please correct me if i am wrong.
          Hide
          Zhou wenjian added a comment -

          @rajeshbabu
          thanks for your reply.

          agree with you, checking whether table is in disabling and disabled also in case of NotServingRegionException will prevent from the issue too.

          There is no problem in trunk because of it handling well in such scenario, but since we will check table state both disabling and disabled, patch for trunk is also applicable

          Show
          Zhou wenjian added a comment - @rajeshbabu thanks for your reply. agree with you, checking whether table is in disabling and disabled also in case of NotServingRegionException will prevent from the issue too. There is no problem in trunk because of it handling well in such scenario, but since we will check table state both disabling and disabled, patch for trunk is also applicable
          Hide
          Ted Yu added a comment -

          @Wenjian:
          Can you run trunk patch through test suite and post back the results ?

          See INFRA-5131 for progress on solving JIRA malfunction.

          Show
          Ted Yu added a comment - @Wenjian: Can you run trunk patch through test suite and post back the results ? See INFRA-5131 for progress on solving JIRA malfunction.
          Hide
          Zhou wenjian added a comment -

          @Ted
          we have no trunk hudson, just 94 in local

          Show
          Zhou wenjian added a comment - @Ted we have no trunk hudson, just 94 in local
          Hide
          ramkrishna.s.vasudevan added a comment -

          @Zhou
          I think only the check for Disabling should be enough right? do you think change in regionOffline is needed? Thanks Zhou.

          Show
          ramkrishna.s.vasudevan added a comment - @Zhou I think only the check for Disabling should be enough right? do you think change in regionOffline is needed? Thanks Zhou.
          Hide
          Zhou wenjian added a comment -

          @ramkrishna.s.vasudevan

          We can shutdown the window between RIT and regions with the change, lots of problems may be caused by it.
          Since truck has made the function synchronized, the change in regionOffline does no harm.

          Show
          Zhou wenjian added a comment - @ramkrishna.s.vasudevan We can shutdown the window between RIT and regions with the change, lots of problems may be caused by it. Since truck has made the function synchronized, the change in regionOffline does no harm.
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12540261/HBASE-6537-94-v2.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2624//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12540261/HBASE-6537-94-v2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2624//console This message is automatically generated.
          Hide
          stack added a comment -

          Patch seems fine to me. Ram, are you good w/ it? (Or Rajesh?)

          Show
          stack added a comment - Patch seems fine to me. Ram, are you good w/ it? (Or Rajesh?)
          Hide
          ramkrishna.s.vasudevan added a comment -

          +1 Stack.

          Show
          ramkrishna.s.vasudevan added a comment - +1 Stack.
          Hide
          Lars Hofhansl added a comment -

          Issue says this if for 0.94 only, but the patches against trunk. Confusing...
          +1 on patch otherwise.

          Show
          Lars Hofhansl added a comment - Issue says this if for 0.94 only, but the patches against trunk. Confusing... +1 on patch otherwise.
          Hide
          rajeshbabu added a comment -
          Show
          rajeshbabu added a comment - +1 Stack @Lars, Initially Zhou attached patch for 94 only but now not able to find these attachments. https://issues.apache.org/jira/secure/attachment/12540000/HBASE-6537-94.patch https://issues.apache.org/jira/secure/attachment/12540261/HBASE-6537-94-v2.patch
          Hide
          Zhou wenjian added a comment -

          @Lars & rajeshbabu.
          i found it unavailable to run hudson, i do not know why, so i just delete the 94 attach. I'll resubmit it if nesscery

          Show
          Zhou wenjian added a comment - @Lars & rajeshbabu. i found it unavailable to run hudson, i do not know why, so i just delete the 94 attach. I'll resubmit it if nesscery
          Hide
          Lars Hofhansl added a comment -

          HadoopQA can only run against trunk. So if you want a HadoopQA run you need a trunk patch.
          But since this issue does not occur in trunk (as you say) we only really need the 0.94 patch.

          Show
          Lars Hofhansl added a comment - HadoopQA can only run against trunk. So if you want a HadoopQA run you need a trunk patch. But since this issue does not occur in trunk (as you say) we only really need the 0.94 patch.
          Hide
          Lars Hofhansl added a comment -

          I'm happy to make a 0.94 patch. But just to be doubly sure, we only want this in 0.94, right?

          Show
          Lars Hofhansl added a comment - I'm happy to make a 0.94 patch. But just to be doubly sure, we only want this in 0.94, right?
          Hide
          Lars Hofhansl added a comment -

          Hmm... There was no RegionStates in 0.94. (added in the ginormous HBASE-6272 patch)

          Show
          Lars Hofhansl added a comment - Hmm... There was no RegionStates in 0.94. (added in the ginormous HBASE-6272 patch)
          Hide
          zhou wenjian added a comment -

          @Lars,thanks for reply.
          update patch for 94

          Show
          zhou wenjian added a comment - @Lars,thanks for reply. update patch for 94
          Hide
          Hadoop QA added a comment -

          -1 overall. Here are the results of testing the latest attachment
          http://issues.apache.org/jira/secure/attachment/12543222/HBASE-6537-94-v2.patch
          against trunk revision .

          +1 @author. The patch does not contain any @author tags.

          -1 tests included. The patch doesn't appear to include any new or modified tests.
          Please justify why no new tests are needed for this patch.
          Also please list what manual steps were performed to verify this patch.

          -1 patch. The patch command could not apply the patch.

          Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2746//console

          This message is automatically generated.

          Show
          Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12543222/HBASE-6537-94-v2.patch against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2746//console This message is automatically generated.
          Hide
          Lars Hofhansl added a comment -

          Committed to 0.94.
          Thank you for the patch Zhou.

          Show
          Lars Hofhansl added a comment - Committed to 0.94. Thank you for the patch Zhou.
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94 #444 (See https://builds.apache.org/job/HBase-0.94/444/)
          HBASE-6537 Race between balancer and disable table can lead to inconsistent cluster (Zhou wenjian) (Revision 1379277)

          Result = FAILURE
          larsh :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
          Show
          Hudson added a comment - Integrated in HBase-0.94 #444 (See https://builds.apache.org/job/HBase-0.94/444/ ) HBASE-6537 Race between balancer and disable table can lead to inconsistent cluster (Zhou wenjian) (Revision 1379277) Result = FAILURE larsh : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94-security #51 (See https://builds.apache.org/job/HBase-0.94-security/51/)
          HBASE-6537 Race between balancer and disable table can lead to inconsistent cluster (Zhou wenjian) (Revision 1379277)

          Result = FAILURE
          larsh :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
          Show
          Hudson added a comment - Integrated in HBase-0.94-security #51 (See https://builds.apache.org/job/HBase-0.94-security/51/ ) HBASE-6537 Race between balancer and disable table can lead to inconsistent cluster (Zhou wenjian) (Revision 1379277) Result = FAILURE larsh : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
          Hide
          Hudson added a comment -

          Integrated in HBase-0.94-security-on-Hadoop-23 #7 (See https://builds.apache.org/job/HBase-0.94-security-on-Hadoop-23/7/)
          HBASE-6537 Race between balancer and disable table can lead to inconsistent cluster (Zhou wenjian) (Revision 1379277)

          Result = FAILURE
          larsh :
          Files :

          • /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
          Show
          Hudson added a comment - Integrated in HBase-0.94-security-on-Hadoop-23 #7 (See https://builds.apache.org/job/HBase-0.94-security-on-Hadoop-23/7/ ) HBASE-6537 Race between balancer and disable table can lead to inconsistent cluster (Zhou wenjian) (Revision 1379277) Result = FAILURE larsh : Files : /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java

            People

            • Assignee:
              Zhou wenjian
              Reporter:
              Zhou wenjian
            • Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development