[IMPALA-10114] Consider using num_rows_fetched instead of fetched_rows in checking whether client has fetched any results in TryQueryRetry - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: Backend
Labels:
None

Epic Color:
ghx-label-8

Description

In QueryDriver::TryQueryRetry, we use fetched_rows to check whether the client has fetched any results. If true, the retry will be skipped. Code snipper:

    lock_guard<mutex> l(*client_request_state->lock());

    // Queries can only be retried if no rows for the query have been fetched
    // (IMPALA-9225).
    if (client_request_state->fetched_rows()) {
      string err_msg = Substitute("Skipping retry of query_id=$0 because the client has "
                                  "already fetched some rows",
          PrintId(query_id));
      VLOG_QUERY << err_msg;
      error->AddDetail(err_msg);
      return;
    }

https://github.com/apache/impala/blob/568b3394b2945d684d8fdb6c4f4e1f33cbf98898/be/src/runtime/query-driver.cc#L100

However, it's possible that fetched_rows is true but the client still fetches nothing due to timeout. For example, the following query takes more than 10s to materialize the first row batch after it comes into FINISHED state:

select * from functional.alltypes where bool_col = sleep(600)

stakiar points out that fetched_rows is protected by ClientRequestState::lock_ (held in TryQueryRetry), while num_rows_fetched_ is only protected by ClientRequestState::fetch_rows_lock_ which is not held in TryQueryRetry. We need to sort out the locking logic to switch to use num_rows_fetched.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Quanlong Huang

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 28/Aug/20 00:12

Updated:: 22/Dec/20 19:50