Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-5174

Coalesce aborted tasks in the TaskMonitor

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • 0.92.0
    • 0.92.3
    • None
    • None

    Description

      Some tasks can get repeatedly canceled like flushing when splitting is going on, in the logs it looks like this:

      2012-01-10 19:28:29,164 INFO org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. due to global heap pressure
      2012-01-10 19:28:29,164 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: NOT flushing memstore for region test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c., flushing=false, writesEnabled=false
      2012-01-10 19:28:29,164 DEBUG org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up because memory above low water=1.6g
      2012-01-10 19:28:29,164 INFO org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. due to global heap pressure
      2012-01-10 19:28:29,164 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: NOT flushing memstore for region test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c., flushing=false, writesEnabled=false
      2012-01-10 19:28:29,164 DEBUG org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush thread woke up because memory above low water=1.6g
      2012-01-10 19:28:29,164 INFO org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Flush of region test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c. due to global heap pressure
      2012-01-10 19:28:29,164 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: NOT flushing memstore for region test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c., flushing=false, writesEnabled=false
      

      But in the TaskMonitor UI you'll get MAX_TASKS (1000) displayed on top of the regions. Basically 1000x:

      Tue Jan 10 19:28:29 UTC 2012	Flushing test1,,1326223218996.3eea0d89af7b851c3a9b4246389a4f2c.	ABORTED (since 31sec ago)	Not flushing since writes not enabled (since 31sec ago)
      

      It's ugly and I'm sure some users will freak out seeing this, plus you have to scroll down all the way to see your regions. Coalescing consecutive aborted tasks seems like a good solution.

      Attachments

        Activity

          People

            Unassigned Unassigned
            jdcryans Jean-Daniel Cryans
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: