Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-20908

Infinite loop on regionserver if region replica are reduced

    XMLWordPrintableJSON

    Details

    • Hadoop Flags:
      Reviewed

      Description

      Steps to reproduce

      hbase(main):003:0> create 'myTable','cf',{REGION_REPLICATION=>3}
      
      
      hbase(main):003:0> put 'myTable','r1','cf:col1','1'
      0 row(s) in 0.1230 seconds
      
      hbase(main):004:0> disable 'myTable'
      alter '0 row(s) in 2.3040 seconds
      
      hbase(main):005:0> alter 'myTable',{REGION_REPLICATION=>1}
      Updating all regions with the new schema...
      1/1 regions updated.
      Done.
      0 row(s) in 11.9550 seconds
      
      hbase(main):006:0> enable 'myTable'
      0 row(s) in 1.2620 seconds
      
      hbase(main):007:0> put 'myTable1','r2','cf:col1','1'
      0 row(s) in 0.0060 seconds
      
      

      This is the replica region request which will not be present now in Meta but was there in cache. Server will say that he is not serving this region.

      com.google.protobuf.ServiceException: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.NotServingRegionException): org.apache.hadoop.hbase.NotServingRegionException: Region d997d9b47a106216b9b117617ec09015 is not online on 10.22.9.76,16020,1531341039091
      	at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3124)
      	at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3106)
      	at org.apache.hadoop.hbase.regionserver.RSRpcServices.replay(RSRpcServices.java:1714)
      	at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:22773)
      	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2150)
      	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
      	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:187)
      	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:167)
      

      Eventually, when we will update our cache after looking into meta , we will get into an infinite loop as this event will not be replicated because the location of the replica will not appear again.

      java.net.SocketTimeoutException: callTimeout=1200000, callDuration=2181316: Can't get the location null
      	at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:170)
      	at org.apache.hadoop.hbase.replication.regionserver.RegionReplicaReplicationEndpoint$RetryingRpcCallable.call(RegionReplicaReplicationEndpoint.java:606)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:748)
      Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the location
      	at org.apache.hadoop.hbase.client.RegionAdminServiceCallable.getRegionLocations(RegionAdminServiceCallable.java:178)
      	at org.apache.hadoop.hbase.client.RegionAdminServiceCallable.getLocation(RegionAdminServiceCallable.java:105)
      	at org.apache.hadoop.hbase.client.RegionAdminServiceCallable.prepare(RegionAdminServiceCallable.java:89)
      	at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:134)
      	... 5 more
      Caused by: java.io.IOException: HRegionInfo was null in myTable, row=keyvalues={myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:regioninfo/1531262022425/Put/vlen=41/seqid=0, myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:seqnumDuringOpen/1531341209944/Put/vlen=8/seqid=0, myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:server/1531341209944/Put/vlen=16/seqid=0, myTable,,1531262022075.f2b68622cfd5851023be29d5599db6c9./info:serverstartcode/1531341209944/Put/vlen=8/seqid=0}
      	at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1289)
      	at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1179)
      	at org.apache.hadoop.hbase.client.RegionAdminServiceCallable.getRegionLocations(RegionAdminServiceCallable.java:170)
      	... 8 more
      
      

        Attachments

        1. 20908_v3.patch
          13 kB
          Ted Yu
        2. 20908_v3-branch-1.patch
          14 kB
          Ted Yu
        3. HBASE-20908_v1.patch
          14 kB
          Ankit Singhal
        4. HBASE-20908_v3.patch
          13 kB
          Ankit Singhal
        5. HBASE-20908_v3-branch-1.patch
          14 kB
          Ankit Singhal
        6. HBASE-20908_v3-branch-1.patch
          14 kB
          Ted Yu
        7. HBASE-20908_v3-branch-1.patch
          14 kB
          Ankit Singhal
        8. HBASE-20908.patch
          22 kB
          Ankit Singhal

          Activity

            People

            • Assignee:
              ankit@apache.org Ankit Singhal
              Reporter:
              ankit@apache.org Ankit Singhal
            • Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: