Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-19335

Fix waitUntilAllRegionsAssigned

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.0.0-beta-1, 2.0.0
    • None

    Description

      Found when debugging flaky test TestRegionObserverInterface#testRecovery.
      In the end, the test does the following:

      • Kills the RS
      • Waits for all regions to be assigned
      • Some validation (unrelated)
      • Cleanup: delete table.
              cluster.killRegionServer(rs1.getRegionServer().getServerName());
              Threads.sleep(1000); // Let the kill soak in.
              util.waitUntilAllRegionsAssigned(tableName);
              LOG.info("All regions assigned");
        
              verifyMethodResult(SimpleRegionObserver.class,
                new String[] { "getCtPreReplayWALs", "getCtPostReplayWALs", "getCtPreWALRestore",
                    "getCtPostWALRestore", "getCtPrePut", "getCtPostPut" },
                tableName, new Integer[] { 1, 1, 2, 2, 0, 0 });
            } finally {
              util.deleteTable(tableName);
              table.close();
            }
          }
        

      However, looking at test logs, found that we had overlapping Assigns with Unassigns. As a result, regions ended up 'stuck in RIT' and the test timeout.
      Assigns were from the ServerCrashRecovery and Unassigns were from the deleteTable cleanup.
      Which begs the question, why did HBTU.waitUntilAllRegionsAssigned(tableName) not wait until recovery was complete.

      Answer: Looks like that function is only meant for sunny scenarios but not for crashes. It iterates over meta and just checks for some value in the server column which is obviously present and equal to the server that was just killed.

      This bug must be affecting other fault tolerance tests too and fixing it may fix more than just one test, hopefully.

      Attachments

        1. HBASE-19335.master.001.patch
          16 kB
          Apekshit Sharma
        2. HBASE-19335.master.002.patch
          15 kB
          Apekshit Sharma

        Issue Links

          Activity

            People

              appy Apekshit Sharma
              appy Apekshit Sharma
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: