Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-27509

Possible region gets stuck in CLOSING state

    XMLWordPrintableJSON

Details

    • Bug
    • Status: In Progress
    • Critical
    • Resolution: Unresolved
    • 2.3.4
    • 3.0.0-beta-2
    • Region Assignment
    • None

    Description

      There is a possible chance of region gets stuck in closing state could be because of race between the flush and close or some where the readlock acquired on the region is not getting released.

      "MemStoreFlusher.1" #236 prio=5 os_prio=0 tid=0x00005639266a4000 nid=0x296e waiting on condition [0x00007fdc48a63000]
         java.lang.Thread.State: WAITING (parking)
          at sun.misc.Unsafe.park(Native Method)
          - parking to wait for  <0x00007fdf42dde850> (a java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
          at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
          at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
          at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
          at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
          at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
          at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2397)
          at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:610)
          at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:579)
          at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:67)
          at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:359)
      
      "MemStoreFlusher.0" #234 prio=5 os_prio=0 tid=0x00005639266a2800 nid=0x296d waiting on condition [0x00007fdc48b64000]
         java.lang.Thread.State: WAITING (parking)
          at sun.misc.Unsafe.park(Native Method)
          - parking to wait for  <0x00007fdf42dde850> (a java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
          at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
          at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
          at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
          at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
          at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
          at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2397)
          at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:610)
          at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:579)
          at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:67)
          at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:359)
      
      "RS_CLOSE_REGION-regionserver/sl73tskrnsqln00107:16020-0" #6337 daemon prio=5 os_prio=0 tid=0x00007fdc05448800 nid=0x15d1 waiting on condition [0x00007fdc1befd000]
         java.lang.Thread.State: WAITING (parking)
          at sun.misc.Unsafe.park(Native Method)
          - parking to wait for  <0x00007fdf42dde850> (a java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
          at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
          at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
          at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
          at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
          at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
          at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1662)
          at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1591)
          - locked <0x00007fdf42ddf358> (a java.lang.Object)
          at org.apache.hadoop.hbase.regionserver.handler.UnassignRegionHandler.process(UnassignRegionHandler.java:114)
          at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:750)
      

      From one of the region server logs flushed has started and replay edits of flush added then close requested then it got stuck in the transition and no further processing of the requests on the region.

        2022-11-23 05:51:00,503 INFO  [MemStoreFlusher.1] regionserver.DefaultStoreFlusher: Flushed memstore data size=232.14 MB at sequenceid=29163855 (bloomFilter=true), to=hdfs://OCEDEV/apps/hbase/data/data/default/hbase_perf_wl2_1000m/8f342bca97c115f8bce460a998e4afbc/.tmp/cf1/c20f27a7407643b18558331d95f7a67f
        2022-11-23 05:51:00,530 INFO  [MemStoreFlusher.1] regionserver.HStore: Added hdfs://OCEDEV/apps/hbase/data/data/default/hbase_perf_wl2_1000m/8f342bca97c115f8bce460a998e4afbc/cf1/c20f27a7407643b18558331d95f7a67f, entries=231413, sequenceid=29163855, filesize=233.7 M
        2022-11-23 05:51:00,536 INFO  [MemStoreFlusher.1] regionserver.HRegion: Finished flush of dataSize ~232.14 MB/243420836, heapSize ~256.00 MB/268439136, currentSize=719.06 KB/736320 for 8f342bca97c115f8bce460a998e4afbc in 2146ms, sequenceid=29163855, compaction requested=true
        2022-11-23 05:51:00,541 INFO  [MemStoreFlusher.1] regionserver.HRegion: Flushing d92c7546be62225859dd641aa88992ea 1/1 column families, dataSize=232.62 MB heapSize=256.53 MB
        2022-11-23 05:51:00,645 INFO  [MemStoreFlusher.0] regionserver.DefaultStoreFlusher: Flushed memstore data size=232.15 MB at sequenceid=29037517 (bloomFilter=true), to=hdfs://OCEDEV/apps/hbase/data/data/default/hbase_perf_wl2_1000m/7beadd786ffc23edc074238c873f800b/.tmp/cf1/eb542141078e4f879128498d01517e98
        2022-11-23 05:51:00,705 INFO  [MemStoreFlusher.0] regionserver.HStore: Added hdfs://OCEDEV/apps/hbase/data/data/default/hbase_perf_wl2_1000m/7beadd786ffc23edc074238c873f800b/cf1/eb542141078e4f879128498d01517e98, entries=231402, sequenceid=29037517, filesize=233.7 M
        2022-11-23 05:51:00,706 INFO  [MemStoreFlusher.0] regionserver.HRegion: Finished flush of dataSize ~232.15 MB/243424206, heapSize ~256.01 MB/268442768, currentSize=935.82 KB/958277 for 7beadd786ffc23edc074238c873f800b in 2324ms, sequenceid=29037517, compaction requested=true
        2022-11-23 05:51:00,708 INFO  [MemStoreFlusher.0] regionserver.HRegion: Flushing 6c6694dce02a13f8109ecc3dd70009d5 1/1 column families, dataSize=232.68 MB heapSize=256.60 MB
        2022-11-23 05:51:00,771 INFO  [RS_CLOSE_REGION-regionserver/sl73tskrnsqln00107:16020-0] handler.UnassignRegionHandler: Close 8f342bca97c115f8bce460a998e4afbc
      
      2022-11-23 07:37:06,887 WARN  [RpcServer.default.FPBQ.Fifo.handler=28,queue=8,port=16020] regionserver.HRegion: Region is too busy to allow lock acquisition.
      org.apache.hadoop.hbase.RegionTooBusyException: Failed to obtain lock; regionName=hbase_perf_wl2_1000m,user8104,1668039117539.8f342bca97c115f8bce460a998e4afbc., server=sl73tskrnsqln00107.visa.com,16020,1669131710569
      	at org.apache.hadoop.hbase.regionserver.HRegion.lock(HRegion.java:8726)
      	at org.apache.hadoop.hbase.regionserver.HRegion.lock(HRegion.java:8705)
      	at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:8610)
      	at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:3161)
      	at org.apache.hadoop.hbase.regionserver.RSRpcServices.mutate(RSRpcServices.java:2985)
      	at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:45517)
      	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:393)
      	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
      	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
      	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)
      
      

      Attachments

        Activity

          People

            rajeshbabu Rajeshbabu Chintaguntla
            rajeshbabu Rajeshbabu Chintaguntla
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: