HBase
  1. HBase
  2. HBASE-4341

HRS#closeAllRegions should take care of HRS#onlineRegions's weak consistency

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.90.4
    • Fix Version/s: 0.90.5
    • Component/s: regionserver
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      This's the reason of why did "https://builds.apache.org/job/hbase-0.90/282" get failure . In this test, one case was timeout and cause the whole test process got killed.

      [logs]
      Here's the related logs(From org.apache.hadoop.hbase.mapreduce.TestTableMapReduce-output.txt):

      2011-08-31 10:09:01,089 INFO  [RegionServer:0;vesta.apache.org,52257,1314785332968.leaseChecker] regionserver.Leases(124): RegionServer:0;vesta.apache.org,52257,1314785332968.leaseChecker closing leases
      2011-08-31 10:09:01,089 INFO  [RegionServer:0;vesta.apache.org,52257,1314785332968.leaseChecker] regionserver.Leases(131): RegionServer:0;vesta.apache.org,52257,1314785332968.leaseChecker closed leases
      2011-08-31 10:09:01,403 INFO  [RegionServer:0;vesta.apache.org,52257,1314785332968] regionserver.HRegionServer(709): Waiting on 1 regions to close
      2011-08-31 10:09:01,403 DEBUG [RegionServer:0;vesta.apache.org,52257,1314785332968] regionserver.HRegionServer(713): {74a7a8befdf9561dc1d90c4241afeac7=mrtest,uuu,1314785328546.74a7a8befdf9561dc1d90c4241afeac7.}
      2011-08-31 10:09:01,697 INFO  [Master:0;vesta.apache.org:50036] master.ServerManager(465): Waiting on regionserver(s) to go down vesta.apache.org,52257,1314785332968
      2011-08-31 10:09:02,697 INFO  [Master:0;vesta.apache.org:50036] master.ServerManager(465): Waiting on regionserver(s) to go down vesta.apache.org,52257,1314785332968
      2011-08-31 10:09:03,008 INFO  [vesta.apache.org:50036.timeoutMonitor] hbase.Chore(79): vesta.apache.org:50036.timeoutMonitor exiting
      2011-08-31 10:09:03,697 INFO  [Master:0;vesta.apache.org:50036] master.ServerManager(465): Waiting on regionserver(s) to go down vesta.apache.org,52257,1314785332968
      2011-08-31 10:09:04,697 INFO  [Master:0;vesta.apache.org:50036] master.ServerManager(465): Waiting on regionserver(s) to go down vesta.apache.org,52257,1314785332968
      2011-08-31 10:09:05,698 INFO  [Master:0;vesta.apache.org:50036] master.ServerManager(465): Waiting on regionserver(s) to go down vesta.apache.org,52257,1314785332968
      2011-08-31 10:09:06,698 INFO  [Master:0;vesta.apache.org:50036] master.ServerManager(465): Waiting on regionserver(s) to go down vesta.apache.org,52257,1314785332968
      2011-08-31 10:09:07,698 INFO  [Master:0;vesta.apache.org:50036] master.ServerManager(465): Waiting on regionserver(s) to go down vesta.apache.org,52257,1314785332968
      

      [Analysis]
      One region was opened during the RS's stopping.
      This is method of "HRS#closeAllRegions":

        protected void closeAllRegions(final boolean abort) {
          closeUserRegions(abort);
          -------------------------
          if (meta != null) closeRegion(meta.getRegionInfo(), abort, false);
          if (root != null) closeRegion(root.getRegionInfo(), abort, false);
        }
      

      HRS#onlineRegions is a ConcurrentHashMap. So walk down this map may not get all the data if some entries are been added during the traverse. Once one region was missed, it can't be closed anymore. And this regionserver will not be stopped normally. Then the following logs occurred:

      2011-08-31 10:09:01,403 INFO  [RegionServer:0;vesta.apache.org,52257,1314785332968] regionserver.HRegionServer(709): Waiting on 1 regions to close
      2011-08-31 10:09:01,403 DEBUG [RegionServer:0;vesta.apache.org,52257,1314785332968] regionserver.HRegionServer(713): {74a7a8befdf9561dc1d90c4241afeac7=mrtest,uuu,1314785328546.74a7a8befdf9561dc1d90c4241afeac7.}
      2011-08-31 10:09:01,697 INFO  [Master:0;vesta.apache.org:50036] master.ServerManager(465): Waiting on regionserver(s) to go down vesta.apache.org,52257,1314785332968
      

        Activity

        Jieshan Bean created issue -
        Hide
        stack added a comment -

        The above analysis makes sense to me. You have a patch Jieshan?

        Show
        stack added a comment - The above analysis makes sense to me. You have a patch Jieshan?
        Hide
        Jieshan Bean added a comment -

        I'm trying to make the patch. Hope I can submit it today.

        Show
        Jieshan Bean added a comment - I'm trying to make the patch. Hope I can submit it today.
        Hide
        stack added a comment -

        You are a good man.

        Show
        stack added a comment - You are a good man.
        Jieshan Bean made changes -
        Field Original Value New Value
        Attachment HBASE-4341-Branch.patch [ 12493558 ]
        Hide
        Ted Yu added a comment -

        The patch is reasonable.

        Show
        Ted Yu added a comment - The patch is reasonable.
        Hide
        stack added a comment -

        I agree.

        Show
        stack added a comment - I agree.
        Hide
        stack added a comment -

        Applied to branch and trunk. Thank you for the patch Jieshan.

        Show
        stack added a comment - Applied to branch and trunk. Thank you for the patch Jieshan.
        stack made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Hadoop Flags [Reviewed]
        Resolution Fixed [ 1 ]
        Hide
        Hudson added a comment -

        Integrated in HBase-TRUNK #2188 (See https://builds.apache.org/job/HBase-TRUNK/2188/)
        HBASE-4341 HRS#closeAllRegions should take care of HRS#onlineRegions's weak consistency

        stack :
        Files :

        • /hbase/trunk/CHANGES.txt
        • /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
        Show
        Hudson added a comment - Integrated in HBase-TRUNK #2188 (See https://builds.apache.org/job/HBase-TRUNK/2188/ ) HBASE-4341 HRS#closeAllRegions should take care of HRS#onlineRegions's weak consistency stack : Files : /hbase/trunk/CHANGES.txt /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
        stack made changes -
        Fix Version/s 0.90.6 [ 12319200 ]
        Fix Version/s 0.90.5 [ 12317145 ]
        stack made changes -
        Fix Version/s 0.90.5 [ 12317145 ]
        Fix Version/s 0.90.6 [ 12319200 ]

          People

          • Assignee:
            Jieshan Bean
            Reporter:
            Jieshan Bean
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development