Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-1395

Scanner KeepAlive requests can get starved on an overloaded server

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 0.8.0
    • Fix Version/s: None
    • Component/s: impala, rpc, tserver
    • Labels:

      Description

      As of 0.8.0, the RPC system schedules RPCs on an earliest-deadline-first basis, rejecting those with later deadlines. This works well for RPCs which are retried on SERVER_TOO_BUSY errors, since the retries maintain the original deadline and thus get higher and higher priority as they get closer to timing out.

      We don't, however, do any retries on scanner KeepAlive RPCs. So, if a keepalive RPC arrives at a heavily overloaded tserver, it will likely get rejected, and won't retry. This means that Impala queries or other long scans that rely on KeepAlives will likely fail on overloaded clusters since the KeepAlive never gets through.

        Attachments

          Activity

            People

            • Assignee:
              tlipcon Todd Lipcon
              Reporter:
              tlipcon Todd Lipcon
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated: