Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-1395

Scanner KeepAlive requests can get starved on an overloaded server

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.8.0
    • None
    • impala, rpc, tserver

    Description

      As of 0.8.0, the RPC system schedules RPCs on an earliest-deadline-first basis, rejecting those with later deadlines. This works well for RPCs which are retried on SERVER_TOO_BUSY errors, since the retries maintain the original deadline and thus get higher and higher priority as they get closer to timing out.

      We don't, however, do any retries on scanner KeepAlive RPCs. So, if a keepalive RPC arrives at a heavily overloaded tserver, it will likely get rejected, and won't retry. This means that Impala queries or other long scans that rely on KeepAlives will likely fail on overloaded clusters since the KeepAlive never gets through.

      Attachments

        Activity

          People

            tlipcon Todd Lipcon
            tlipcon Todd Lipcon
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: