Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-2439

HBase can get stuck if updates to META are blocked

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.90.0
    • None
    • None
    • Reviewed

    Description

      (We noticed this on a import-style test in a small test cluster.)

      If compactions are running slow, and we are doing a lot of region splits, then, since META has a much smaller hard-coded memstore flush size (16KB), it quickly accumulates lots of store files. Once this exceeds "hbase.hstore.blockingStoreFiles", flushes to META become no-ops. This causes METAs memstore footprint to grow. Once this exceeds "hbase.hregion.memstore.block.multiplier * 16KB", we block further updates to META.

      In my test setup:
      hbase.hregion.memstore.block.multiplier = 4.
      and,
      hbase.hstore.blockingStoreFiles = 15.

      And we saw messages of the form:

      2010-04-09 18:37:39,539 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 23 on 60020' on region .META.,,1: memstore size 64.2k is >= than blocking 64.0k size
      

      Now, if around the same time the CompactSplitThread does a compaction and determines it is going split the region. As part of finishing the split, it wants to update META about the daughter regions.

      It'll end up waiting for the META to become unblocked. The single CompactSplitThread is now held up, and no further compactions can proceed. META's compaction request is itself blocked because the compaction queue will never get cleared.

      This essentially creates a deadlock and the region server is able to not progress any further. Eventually, each region server's CompactSplitThread ends up in the same state.

      Attachments

        Activity

          People

            kannanm Kannan Muthukkaruppan
            kannanm Kannan Muthukkaruppan
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: