Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-25899

Improve efficiency of SnapshotHFileCleaner

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0-alpha-1, 2.0.0
    • 3.0.0-alpha-1, 2.5.0
    • master
    • None

    Description

      We have met same problems of thousands threads in HBASE-22867, but after this issue, the cleaner becomes more inefficient.

      From the jstack we can see that most dir-scan threads are blocked at SnapshotHFileCleaner#getDeletableFiles,

      "dir-scan-pool-19" #694 daemon prio=5 os_prio=0 tid=0x0000000002ab1800 nid=0x26a7e waiting for monitor entry [0x00007fb0a9913000]
         java.lang.Thread.State: BLOCKED (on object monitor)
              at org.apache.hadoop.hbase.master.snapshot.SnapshotHFileCleaner.getDeletableFiles(SnapshotHFileCleaner.java:74)
              - waiting to lock <0x00007fb148737048> (a org.apache.hadoop.hbase.master.snapshot.SnapshotHFileCleaner)
              at org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteFiles(CleanerChore.java:498)
              at org.apache.hadoop.hbase.master.cleaner.CleanerChore.lambda$traverseAndDelete$1(CleanerChore.java:246)
              at org.apache.hadoop.hbase.master.cleaner.CleanerChore$$Lambda$41/1187372779.act(Unknown Source)
              at org.apache.hadoop.hbase.master.cleaner.CleanerChore.deleteAction(CleanerChore.java:358)
              at org.apache.hadoop.hbase.master.cleaner.CleanerChore.traverseAndDelete(CleanerChore.java:246)
              at org.apache.hadoop.hbase.master.cleaner.CleanerChore.lambda$null$2(CleanerChore.java:255)
              at org.apache.hadoop.hbase.master.cleaner.CleanerChore$$Lambda$38/2003131501.run(Unknown Source)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
              at java.lang.Thread.run(Thread.java:745)

      and all the HFileCleaner threads are waiting at the delete tasks queue,

      "gha-data-hbase0002:16000.activeMasterManager-HFileCleaner.large.2-1621210982419" #358 daemon prio=5 os_prio=0 tid=0x00007fb967fc0000 nid=0x266f2 waiting on condition [0x00007fb0c57d6000]
         java.lang.Thread.State: WAITING (parking)
              at sun.misc.Unsafe.park(Native Method)
              - parking to wait for  <0x00007fb1486db9f0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
              at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
              at org.apache.hadoop.hbase.util.StealJobQueue.take(StealJobQueue.java:106)
              at org.apache.hadoop.hbase.master.cleaner.HFileCleaner.consumerLoop(HFileCleaner.java:264)
              at org.apache.hadoop.hbase.master.cleaner.HFileCleaner$1.run(HFileCleaner.java:233)
      

      So it's need to increase the speed of scanning files. But since the getDeletableFiles is a synchronized method, increasing the number of scan-dir threads can not solve this problem. 

      After looking through the codes in SnapshotHFileCleaner and SnapshotFileCache, I think the lock granularity in them should be optimized.

       

      Attachments

        1. 78631.jstack
          454 kB
          Xiaolin Ha
        2. cleaner-result.png
          699 kB
          Xiaolin Ha

        Issue Links

          Activity

            People

              Xiaolin Ha Xiaolin Ha
              Xiaolin Ha Xiaolin Ha
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: