Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-22287

inifinite retries on failed server in RSProcedureDispatcher

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 3.0.0-alpha-1, 2.3.0
    • None
    • None
    • Reviewed
    • Add backoff. Avoid retrying every 100ms.

    Description

      We observed this recently on some cluster, I'm still investigating the root cause however seems like the retries should have special handling for this exception; and separately probably a cap on number of retries

      2019-04-20 04:24:27,093 WARN  [RSProcedureDispatcher-pool4-t1285] procedure.RSProcedureDispatcher: request to server ,17020,1555742560432 failed due to java.io.IOException: Call to :17020 failed on local exception: org.apache.hadoop.hbase.ipc.FailedServerException: This server is in the failed servers list: :17020, try=26603, retrying...
      

      The corresponding worker is stuck

      Attachments

        Issue Links

          Activity

            People

              stack Michael Stack
              sershe Sergey Shelukhin
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: