Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5558

Query hang after coordinator crash because DoRpc(ReportExecStatus) fails and is not retried

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • Impala 2.9.0
    • Impala 2.10.0
    • Distributed Exec
    • None

    Description

      The following loop aims to retry the RPC for up to 3 times when reporting exec status of a fragment instance to the coordinator. However, it's not very effective because we didn't check out a new client between retry. In case the connection is bad, the retry will fail again. In addition, since we are reporting the query profile, it should be fine to retry all the time even if the payload was partially sent to the remote client.

      cc'ing henryr and sailesh

        // Try to send the RPC 3 times before failing.
        for (int i = 0; i < 3; ++i) {
          rpc_status = coord.DoRpc(
              &ImpalaBackendClient::ReportExecStatus, params, &res, &retry_is_safe);
          if (rpc_status.ok()) break;
          if (!retry_is_safe) break;
          if (i < 2) SleepForMs(RETRY_SLEEP_MS);
        }
      

      Attachments

        Issue Links

          Activity

            People

              kwho Michael Ho
              kwho Michael Ho
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: