Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-7517

Hung scanner when soft memory limit exceeded

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • Impala 3.1.0
    • Impala 3.1.0
    • Backend
    • None
    • ghx-label-2

    Description

      As reported on the mailing list, this is a regression due to IMPALA-7096 (7ccf7369085aa49a8fc0daf6f91d97b8a3135682). The scanner thread has the following code:

         // Stop extra threads if we're over a soft limit in order to free up memory.
          if (!first_thread && mem_tracker_->AnyLimitExceeded(MemLimit::SOFT)) {
            break;
          }
       
          // Done with range and it completed successfully
          if (progress_.done()) {
            // All ranges are finished.  Indicate we are done.
            SetDone();
            break;
          }
       
          if (scan_range == nullptr && num_unqueued_files == 0) {
            unique_lock<mutex> l(lock_);
            // All ranges have been queued and DiskIoMgr has no more new ranges for this scan
            // node to process. This means that every range is either done or being processed by
            // another thread.
            all_ranges_started_ = true;
            break;
          }
        }
      

       
      What if we have the following scenario:
       
      T1) grab scan range 1 and start processing
       
      T2) grab scan range 2 and start processing
       
      T1) finish scan range 1 and see that 'progress_' is not done()
      T1) loop around, get no scan range (there are no more), so set all_ranges_satrted_ and break
      T1) thread exits
       
      T2) finish scan range 2
      T2) happen to hit a soft memory limit error due to pressure from other exec nodes, etc. Since we aren't the first thread, we break. (even though the first thread is no longer running)
      T2) thread exits
       
      Note that no one got to the point of calling SetDone() because we break due to the memory limit error before checking progress_.Done().
       
      Thus, the query will hang forever.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            tlipcon Todd Lipcon
            tlipcon Todd Lipcon
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment