Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5558

Query hang after coordinator crash because DoRpc(ReportExecStatus) fails and is not retried

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: Impala 2.9.0
    • Fix Version/s: Impala 2.10.0
    • Component/s: Distributed Exec
    • Labels:
      None

      Description

      The following loop aims to retry the RPC for up to 3 times when reporting exec status of a fragment instance to the coordinator. However, it's not very effective because we didn't check out a new client between retry. In case the connection is bad, the retry will fail again. In addition, since we are reporting the query profile, it should be fine to retry all the time even if the payload was partially sent to the remote client.

      cc'ing Henry Robinson and Sailesh Mukil

        // Try to send the RPC 3 times before failing.
        for (int i = 0; i < 3; ++i) {
          rpc_status = coord.DoRpc(
              &ImpalaBackendClient::ReportExecStatus, params, &res, &retry_is_safe);
          if (rpc_status.ok()) break;
          if (!retry_is_safe) break;
          if (i < 2) SleepForMs(RETRY_SLEEP_MS);
        }
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                kwho Michael Ho
                Reporter:
                kwho Michael Ho
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: