Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-28271

Infinite waiting on lock acquisition by snapshot can result in unresponsive master

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0-alpha-4, 2.4.17, 2.5.7
    • 2.6.0, 2.5.8, 3.0.0-beta-2
    • None
    • None
    • Reviewed

    Description

      When a region is stuck in transition for significant time, any attempt to take snapshot on the table would keep master handler thread in forever waiting state. As part of the creating snapshot on enabled or disabled table, in order to get the table level lock, LockProcedure is executed but if any region of the table is in transition, LockProcedure could not be executed by the snapshot handler, resulting in forever waiting until the region transition is completed, allowing the table level lock to be acquired by the snapshot handler.

      In cases where a region stays in RIT for considerable time, if enough attempts are made by the client to create snapshots on the table, it can easily exhaust all handler threads, leading to potentially unresponsive master. Attached a sample thread dump.

      Proposal: The snapshot handler should not stay stuck forever if it cannot take table level lock, it should fail-fast.

      Attachments

        1. image.png
          556 kB
          Viraj Jasani

        Issue Links

          Activity

            People

              vjasani Viraj Jasani
              vjasani Viraj Jasani
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: