Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-27951

Use ADMIN_QOS in MasterRpcServices for regionserver operational dependencies

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.4.10
    • 2.6.0, 2.4.18, 2.5.6, 3.0.0-beta-1
    • None
    • None
    • Reviewed

    Description

      Analysis of a recent production incident is not yet complete but an item of note is an apparent deadlock. Imagine you are gracefully draining a regionserver by way of a flurry of moveRegion requests. The handler for moveRegion submits a TRSP and then waits on its future without timeout. Imagine that there are sufficient number of moveRegion requests to tie up the normal priority master RPC pool. Now imagine that all of those requests are waiting on TRSPs pending on a regionserver that is concurrently bounced or maybe it fails. The TRSPs are blocked in REGION_STATE_TRANSITION_CLOSE because the target regionserver terminated before responding to the close requests, blocking the moveRegion requests, blocking the RPC handlers. The regionserver restarts and tries to check in, but cannot report to the master because there are no free normal priority handlers to handle it. It seems not correct to have the regionserver operational dependencies (regionServerStartup, regionServerReport, and reportFatalRSError) contending with normal priority requests.

      They should be made ADMIN_QOS priority to avoid this case. 

      Attachments

        Issue Links

          Activity

            People

              apurtell Andrew Kyle Purtell
              apurtell Andrew Kyle Purtell
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: