HBase
  1. HBase
  2. HBASE-4277

HRS.closeRegion should be able to close regions with only the encoded name

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.90.4
    • Fix Version/s: 0.90.5
    • Component/s: None
    • Labels:
      None

      Description

      As suggested by Stack in HBASE-4217 creating a new issue to provide a patch for 0.90.x version.

      We had some sort of an outage this morning due to a few racks losing power, and some regions were left in the following state:

      ERROR: Region UNKNOWN_REGION on sv4r17s9:60020, key=e32bbe1f48c9b3633c557dc0291b90a3, not on HDFS or in META but deployed on sv4r17s9:60020

      That region was deleted by the master but the region server never got the memo. Right now there's no way to force close it because HRS.closeRegion requires an HRI and the only way to create one is to get it from .META. which in our case doesn't contain a row for that region. Basically we have to wait until that server is dead to get rid of the region and make hbck happy.

      The required change is to have closeRegion accept an encoded name in both HBA (when the RS address is provided) and HRS since it's able to find it anyways from it's list of live regions.

      If a 0.90 version, we maybe should do that in another issue.

      1. HBASE-4277_0.90.patch
        16 kB
        ramkrishna.s.vasudevan

        Issue Links

          Activity

          Hide
          ramkrishna.s.vasudevan added a comment -

          Integrated to 0.90.5. Thanks for your reviews Stack and Ted.

          Show
          ramkrishna.s.vasudevan added a comment - Integrated to 0.90.5. Thanks for your reviews Stack and Ted.
          Hide
          stack added a comment -

          ++1 on commit (smile)

          Show
          stack added a comment - ++1 on commit (smile)
          Hide
          ramkrishna.s.vasudevan added a comment -

          Tests passes...

            testClockSkewDetection(org.apache.hadoop.hbase.master.TestClockSkewDetection): hostname can't be null
            testScanner(org.apache.hadoop.hbase.regionserver.TestScanner): hostname can't be null
          

          This failure is not due to the patch for this JIRA.

          Show
          ramkrishna.s.vasudevan added a comment - Tests passes... testClockSkewDetection(org.apache.hadoop.hbase.master.TestClockSkewDetection): hostname can't be null testScanner(org.apache.hadoop.hbase.regionserver.TestScanner): hostname can't be null This failure is not due to the patch for this JIRA.
          Hide
          stack added a comment -

          +1 on commit

          Show
          stack added a comment - +1 on commit
          Hide
          ramkrishna.s.vasudevan added a comment -

          The patch applies as is.. If it is ok I can go ahead and commit it. Pls share your comments.

          Show
          ramkrishna.s.vasudevan added a comment - The patch applies as is.. If it is ok I can go ahead and commit it. Pls share your comments.
          Hide
          ramkrishna.s.vasudevan added a comment -

          I will see if the patch still applies if not will prepare an updated on and then commit it.
          Thanks for your review Stack

          Show
          ramkrishna.s.vasudevan added a comment - I will see if the patch still applies if not will prepare an updated on and then commit it. Thanks for your review Stack
          Hide
          stack added a comment -

          +1 on commit to 0.90. I brought up a patched server and it was able to take loads fine. I then talked to it with an UNPATCHED client doing moves and gets. Seems fine.

          Show
          stack added a comment - +1 on commit to 0.90. I brought up a patched server and it was able to take loads fine. I then talked to it with an UNPATCHED client doing moves and gets. Seems fine.
          Hide
          stack added a comment -

          This is a 0.90.5 issue, not for 0.92.0. Moving it out.

          Show
          stack added a comment - This is a 0.90.5 issue, not for 0.92.0. Moving it out.
          Hide
          ramkrishna.s.vasudevan added a comment -

          For rolling restart verification the following steps were taken
          -> Start 2 RS with 0.90.x version one with patch other without patch.
          -> Have a client with the patched version
          -> Call the new api added in this defect for a region in the patched version
          This works fine.
          -> Call a region in the unpatch version
          We get NoSuchMethod found exception.

          Show
          ramkrishna.s.vasudevan added a comment - For rolling restart verification the following steps were taken -> Start 2 RS with 0.90.x version one with patch other without patch. -> Have a client with the patched version -> Call the new api added in this defect for a region in the patched version This works fine. -> Call a region in the unpatch version We get NoSuchMethod found exception.
          Hide
          stack added a comment -

          Patch looks good to me (Thanks for doing this RAM). Marking critical on 0.90.5. Before committing, I'd like to check that the addition of new methods to Interface do not break rolling restart.

          Show
          stack added a comment - Patch looks good to me (Thanks for doing this RAM). Marking critical on 0.90.5. Before committing, I'd like to check that the addition of new methods to Interface do not break rolling restart.
          Hide
          Ted Yu added a comment -

          +1 on patch.

          Show
          Ted Yu added a comment - +1 on patch.

            People

            • Assignee:
              ramkrishna.s.vasudevan
              Reporter:
              ramkrishna.s.vasudevan
            • Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development