Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-24897

RegionReplicaFlushHandler should handle NoServerForRegionException to avoid aborting RegionServer

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.2.6
    • Component/s: None
    • Labels:
      None

      Description

      Debug flaky test TestRegionReplicaReplicationEndpoint, I found the RS aborted because RegionReplicaFlushHandler flush failed. When create a new table with region replica, the assign order may be:

      1. assign 0002 replica region and trigger primary region flush.
      2. assign 0001 replica region and trigger primary region flush.
      3. assign primary region.

      But the primary region flush may failed because the primary region not opened now. So it may abort the RS......

       

      2020-08-18 16:56:30,041 INFO [RS_OPEN_REGION-regionserver/hao-OptiPlex-7050:0-0] handler.AssignRegionHandler(141): Opened testRegionReplicaReplicationIgnoresDisabledTables_drop_false_disabledReplication_false,,1597740978463_0002.66e9757a05fbae7623cfea3369fc8354.
      2020-08-18 16:56:30,558 INFO [RS_OPEN_REGION-regionserver/hao-OptiPlex-7050:0-0] handler.AssignRegionHandler(141): Opened testRegionReplicaReplicationIgnoresDisabledTables_drop_false_disabledReplication_false,,1597740978463_0001.22ff45423b0f1f0e93794f673449d140.
      2020-08-18 16:56:31,192 INFO [RS_OPEN_REGION-regionserver/hao-OptiPlex-7050:0-0] handler.AssignRegionHandler(141): Opened testRegionReplicaReplicationIgnoresDisabledTables_drop_false_disabledReplication_false,,1597740978463.901f9cd06bbf27ef7c2d70b5af725cd2.
      
      
      2020-08-18 16:58:53,857 ERROR [RS_REGION_REPLICA_FLUSH_OPS-regionserver/hao-OptiPlex-7050:0-0] helpers.MarkerIgnoringBase(159): ***** ABORTING region server hao-optiplex-7050,36368,1597740961432: ServerAborting because an exception was thrown *****
      org.apache.hadoop.hbase.client.NoServerForRegionException: No server address listed in hbase:meta for region testRegionReplicaReplicationWithReplicas_10,,1597741128945.0f541dc1a7ca64797c4cf054adb9edfb. containing row 
        at org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegionInMeta(ConnectionImplementation.java:926)
        at org.apache.hadoop.hbase.client.ConnectionImplementation.locateRegion(ConnectionImplementation.java:784)
        at org.apache.hadoop.hbase.client.ConnectionUtils$ShortCircuitingClusterConnection.locateRegion(ConnectionUtils.java:140)
        at org.apache.hadoop.hbase.client.RegionAdminServiceCallable.getRegionLocations(RegionAdminServiceCallable.java:147)
        at org.apache.hadoop.hbase.client.RegionAdminServiceCallable.getLocation(RegionAdminServiceCallable.java:98)
        at org.apache.hadoop.hbase.client.RegionAdminServiceCallable.prepare(RegionAdminServiceCallable.java:84)
        at org.apache.hadoop.hbase.client.FlushRegionCallable.prepare(FlushRegionCallable.java:62)
        at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:105)
        at org.apache.hadoop.hbase.regionserver.handler.RegionReplicaFlushHandler.triggerFlushInPrimaryRegion(RegionReplicaFlushHandler.java:129)
        at org.apache.hadoop.hbase.regionserver.handler.RegionReplicaFlushHandler.process(RegionReplicaFlushHandler.java:78)
        at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
      

      I thought the fix should be assign primary region firstly when enable region replica featue. Will check the implmenation of region replica.

       

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                zghao Guanghao Zhang
                Reporter:
                zghao Guanghao Zhang
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: