Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-27413

MemStoreFlusher: Cache flush failed for region

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.2.6
    • None
    • None
    • None
    • Hadoop 3.3.1

      HBase 2.2.6

      jdk: openjdk version "1.8.0_342"

       

    Description

      We are running an HBase cluster (Hadoop 3.3.1 + HBase 2.2.6) when clients frequently write a lot of data into this HBase cluster, we got the message "Cache flush failed for the region" error log. then got a lot of "responseTooSlow" logs. below are the sample logs of the "Cache flush failed for region" messages:

      022-08-09 11:04:05,069 DEBUG [MemStoreFlusher.0] regionserver.HRegion: Flush status journal:
          Acquiring readlock on region at 1660067945028
          Running coprocessor pre-flush hooks at 1660067945028
          Obtaining lock to block concurrent updates at 1660067945028
          Preparing flush snapshotting stores in dfab869db13603b7800cf7b33d58a885 at 1660067945028
          Finished memstore snapshotting ns1:htable1,77e1422d7b070b9ee3ac9466174655e884d41d87bbe9182e05661cf11d02298b,1653473214632.dfab869db13603b7800cf7b33d58a885., syncing WAL and waiting on mvcc, flushsize=dataSize=28417, getHeapSize=59360, getOffHeapSize=0, getCellsCount=284 at 1660067945028
      2022-08-09 11:04:05,070 ERROR [MemStoreFlusher.0] regionserver.MemStoreFlusher: Cache flush failed for region ns1:htable1,77e1422d7b070b9ee3ac9466174655e884d41d87bbe9182e05661cf11d02298b,1653473214632.dfab869db13603b7800cf7b33d58a885.
      org.apache.hadoop.hbase.exceptions.TimeoutIOException: Failed to get sync result after 300000 ms for txid=77949378, WAL system stuck?
          at org.apache.hadoop.hbase.regionserver.wal.SyncFuture.get(SyncFuture.java:145)
          at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.blockOnSync(AbstractFSWAL.java:722)
          at org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.sync(AsyncFSWAL.java:616)
          at org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.sync(AsyncFSWAL.java:593)
          at org.apache.hadoop.hbase.regionserver.HRegion.doSyncOfUnflushedWALChanges(HRegion.java:2754)
          at org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2691)
          at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2549)
          at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2523)
          at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2413)
          at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:611)
          at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:580)
          at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:68)
          at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:360)
          at java.lang.Thread.run(Thread.java:748)
      2022-08-09 11:04:05,070 INFO  [MemStoreFlusher.0] hbase.RangerAuthorizationCoprocessor: Unable to get remote Address
      2022-08-09 11:04:05,071 DEBUG [MemStoreFlusher.0] regionserver.HRegion: Flush column family: ai of 326f48e574a65d7ba599a68dbf742603 because time of oldest edit=1660064392583 is > 3600000 from now =1660068245071
      2022-08-09 11:04:05,072 INFO  [MemStoreFlusher.0] regionserver.HRegion: Flushing 326f48e574a65d7ba599a68dbf742603 1/3 column families, dataSize=111.73 KB heapSize=184.23 KB; ai={dataSize=12.15 KB, heapSize=24.73 KB, offHeapSize=0 B}
      2022-08-09 11:04:05,275 WARN  [RpcServer.default.FPBQ.Fifo.handler=42,queue=0,port=16020] ipc.RpcServer: (responseTooSlow): {"call":"Multi(org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$MultiRequest)","multi.gets":0,"starttimems":"1660067945231","responsesize":"1659","method":"Multi","param":"region\u003d ns1:htable12,a8f9942ef122d7bb0d0b0036cc5d776d030f9c06b7af0ebb2908edd5a4f43b4c,1460489972562.204291859a0abda7305f73ccb03b174c., for 1 act \u003cTRUNCATED\u003e","processingtimems":300043,"client":"172.16.139.217:35044","queuetimems":0,"multi.servicecalls":0,"class":"HRegionServer","multi.mutations":1}
      2022-08-09 11:04:05,561 WARN  [RpcServer.default.FPBQ.Fifo.handler=14,queue=2,port=16020] ipc.RpcServer: (responseTooSlow): {"call":"Multi(org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$MultiRequest)","multi.gets":0,"starttimems":"1660067945519","responsesize":"1659","method":"Multi","param":"region\u003d ns1:htable13,a00bd53d,1502528412031.f04f3634f3822a67dccf8286cfe3e19a., for 1 action(s) and 1st row key\u003da0fd50edde8a734b63a39899f4572a8 \u003cTRUNCATED\u003e","processingtimems":300042,"client":"172.16.139.215:38654","queuetimems":0,"multi.servicecalls":0,"class":"HRegionServer","multi.mutations":1} 

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            MarchWang March Wang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: