Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-8940

TestRegionMergeTransactionOnCluster#testWholesomeMerge may fail due to race in opening region

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.95.2
    • None
    • None
    • Reviewed

    Description

      From http://54.241.6.143/job/HBase-TRUNK-Hadoop-2/org.apache.hbase$hbase-server/395/testReport/org.apache.hadoop.hbase.regionserver/TestRegionMergeTransactionOnCluster/testWholesomeMerge/ :

      013-07-11 09:33:44,154 INFO  [AM.ZK.Worker-pool-2-thread-2] master.RegionStates(309): Offlined 3ffefd878a234031675de6b2c70b2ead from ip-10-174-118-204.us-west-1.compute.internal,60498,1373535184820
      2013-07-11 09:33:44,154 INFO  [AM.ZK.Worker-pool-2-thread-2] master.AssignmentManager$4(1223): The master has opened testWholesomeMerge,testRow0020,1373535210125.3ffefd878a234031675de6b2c70b2ead. that was online on ip-10-174-118-204.us-west-1.compute.internal,59210,1373535184884
      2013-07-11 09:33:44,182 DEBUG [RS_OPEN_REGION-ip-10-174-118-204:59210-1] zookeeper.ZKAssign(862): regionserver:59210-0x13fcd13a20c0002 Successfully transitioned node 3ffefd878a234031675de6b2c70b2ead from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED
      2013-07-11 09:33:44,182 INFO  [MASTER_TABLE_OPERATIONS-ip-10-174-118-204:39405-0] handler.DispatchMergingRegionHandler(154): Failed send MERGE REGIONS RPC to server ip-10-174-118-204.us-west-1.compute.internal,59210,1373535184884 for region testWholesomeMerge,,1373535210124.efcb10dcfa250e31bfd50dc6c7049f32.,testWholesomeMerge,testRow0020,1373535210125.3ffefd878a234031675de6b2c70b2ead., focible=false, org.apache.hadoop.hbase.exceptions.RegionOpeningException: Region is being opened: 3ffefd878a234031675de6b2c70b2ead
      	at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2566)
      	at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3862)
      	at org.apache.hadoop.hbase.regionserver.HRegionServer.mergeRegions(HRegionServer.java:3649)
      	at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:14400)
      	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2124)
      	at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1831)
      
      2013-07-11 09:33:44,182 DEBUG [RS_OPEN_REGION-ip-10-174-118-204:59210-1] handler.OpenRegionHandler(373): region transitioned to opened in zookeeper: {ENCODED => 3ffefd878a234031675de6b2c70b2ead, NAME => 'testWholesomeMerge,testRow0020,1373535210125.3ffefd878a234031675de6b2c70b2ead.', STARTKEY => 'testRow0020', ENDKEY => 'testRow0040'}, server: ip-10-174-118-204.us-west-1.compute.internal,59210,1373535184884
      2013-07-11 09:33:44,183 DEBUG [RS_OPEN_REGION-ip-10-174-118-204:59210-1] handler.OpenRegionHandler(186): Opened testWholesomeMerge,testRow0020,1373535210125.3ffefd878a234031675de6b2c70b2ead. on server:ip-10-174-118-204.us-west-1.compute.internal,59210,1373535184884
      

      We can see that MASTER_TABLE_OPERATIONS thread couldn't get region 3ffefd878a234031675de6b2c70b2ead because RS_OPEN_REGION thread finished region opening 1 millisecond later.

      One solution is to retry operation when receiving RegionOpeningException

      Attachments

        1. 8940-addendum.patch
          0.7 kB
          Chunhui Shen
        2. 8940-trunk-v2.patch
          3 kB
          Chunhui Shen
        3. 8940-v1.txt
          2 kB
          Ted Yu
        4. 8940v3.txt
          3 kB
          Michael Stack

        Issue Links

          Activity

            People

              zjushch Chunhui Shen
              yuzhihong@gmail.com Ted Yu
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: