Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-6564

Queries randomly fail with "CANCELLED" due to a race with IssueInitialRanges()

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Not A Bug
    • Affects Version/s: Impala 2.12.0
    • Fix Version/s: None
    • Component/s: Backend
    • Labels:

      Description

      I've been chasing a flaky test that I saw in test_basic_runtime_filters when running against https://gerrit.cloudera.org/#/c/8966/ (the scanner buffer pool changes).

      I think it is a latent bug that has started reproducing more frequently. What I've found is:

      • Different queries fail with CANCELLED. I can repro it on my branch ~3/4 times by running: impala-py.test tests/query_test/test_runtime_filters.py -n8 --verbose --maxfail 1 -k basic . It happens with a variety of queries and file formats.
      • It seems to happen when all files are pruned out by runtime filters
      • Logging reveals IssueInitialRanges() fails with a CANCELLED status, which propagates up to the query status:
          if (!initial_ranges_issued_) {
            // We do this in GetNext() to maximise the amount of work we can do while waiting for
            // runtime filters to show up. The scanner threads have already started (in Open()),
            // so we need to tell them there is work to do.
            // TODO: This is probably not worth splitting the organisational cost of splitting
            // initialisation across two places. Move to before the scanner threads start.
            Status status = IssueInitialScanRanges(state);
            if (!status.ok()) LOG(INFO) << runtime_state_->fragment_instance_id() << " IssueInitialRanges() failed with status: " << status.GetDetail()  << " " << (void*) this;
        
      • It appears that the CANCELLED comes from DiskIoMgr::AddScanRanges().
      • That function returned cancelled because a scanner thread noticed that the scan was complete here and cancelled the RequestContext:
            // Done with range and it completed successfully
            if (progress_.done()) {
              // All ranges are finished.  Indicate we are done.
              LOG(INFO) << runtime_state_->fragment_instance_id() << " All ranges done " << (void*) this;
              SetDone();
              break;
            }
        

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                tarmstrong Tim Armstrong
                Reporter:
                tarmstrong Tim Armstrong
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: