Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-28128

Reject requests at RPC layer when RegionServer is aborting

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.6.0, 2.5.6, 3.0.0-beta-1
    • None
    • None

    Description

      We recently had an operational incident where the RegionServer got aborted, but failed to exit within a reasonable timeframe. We're going to tune hbase.regionserver.abort.timeout much lower than the 20m default, but even with that it makes little sense to accept requests when the server is aborting.

      In our case, the server was impaired and not processing requests. The call queue was full, so NettyRpcServer kept trying and failing to add requests to the queue. This results in CallQueueTooBigException, which is not a meta cache clearing exception. It continued throwing these exceptions for multiple minutes until we finally manually killed the server.

      I'd like to add a check in ServerRpcConnection.processRequest, where we check if regionServer.isAborted() and throw a RegionServerAbortedException rather than attempt to enqueue the request.

      Attachments

        Issue Links

          Activity

            People

              bbeaudreault Bryan Beaudreault
              bbeaudreault Bryan Beaudreault
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: