Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-21031

Memory leak if replay edits failed during region opening

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.1.0, 2.0.1
    • Fix Version/s: 3.0.0, 2.2.0, 2.1.1, 2.0.2
    • Component/s: None
    • Labels:
      None

      Description

      Due to HBASE-21029, when replaying edits with a lot of same cells, the memstore won't flush, a exception will throw when all heap space was used:

      2018-08-06 15:52:27,590 ERROR [RS_OPEN_REGION-regionserver/hb-bp10cw4ejoy0a2f3f-009:16020-2] handler.OpenRegionHandler(302): Failed open of region=hbase_test,dffa78,1531227033378.cbf9a2daf3aaa0c7e931e9c9a7b53f41., starting to roll back the global memstore size.
      java.lang.OutOfMemoryError: Java heap space
              at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
              at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
              at org.apache.hadoop.hbase.regionserver.OnheapChunk.allocateDataBuffer(OnheapChunk.java:41)
              at org.apache.hadoop.hbase.regionserver.Chunk.init(Chunk.java:104)
              at org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:226)
              at org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:180)
              at org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:163)
              at org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.getOrMakeChunk(MemStoreLABImpl.java:273)
              at org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:148)
              at org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:111)
              at org.apache.hadoop.hbase.regionserver.Segment.maybeCloneWithAllocator(Segment.java:178)
              at org.apache.hadoop.hbase.regionserver.AbstractMemStore.maybeCloneWithAllocator(AbstractMemStore.java:287)
              at org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:107)
              at org.apache.hadoop.hbase.regionserver.HStore.add(HStore.java:706)
              at org.apache.hadoop.hbase.regionserver.HRegion.restoreEdit(HRegion.java:5494)
              at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:4608)
              at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:4404)
      

      After this exception, the memstore did not roll back, and since MSLAB is used, all the chunk allocated won't release for ever. Those memory is leak forever...

      We need to rollback the memory if open region fails(For now, only global memstore size is decreased after failure).

      Another problem is that we use replayEditsPerRegion in RegionServerAccounting to record how many memory used during replaying. And decrease the global memstore size if replay fails. This is not right, since during replaying, we may also flush the memstore, the size in the map of replayEditsPerRegion is not accurate at all!

        Attachments

        1. memoryleak.png
          117 kB
          Allan Yang
        2. HBASE-21031.branch-2.0.006.patch
          19 kB
          Allan Yang
        3. HBASE-21031.branch-2.0.006.patch
          19 kB
          stack
        4. HBASE-21031.branch-2.0.005.patch
          20 kB
          Allan Yang
        5. HBASE-21031.branch-2.0.004.patch
          20 kB
          Allan Yang
        6. HBASE-21031.branch-2.0.003.patch
          19 kB
          Allan Yang
        7. HBASE-21031.branch-2.0.002.patch
          18 kB
          Allan Yang
        8. HBASE-21031.branch-2.0.001.patch
          15 kB
          Allan Yang

          Issue Links

            Activity

              People

              • Assignee:
                allan163 Allan Yang
                Reporter:
                allan163 Allan Yang
              • Votes:
                0 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: