Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4383

Plan fragments may never send reports to coordinator

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: Impala 2.7.0
    • Fix Version/s: Impala 2.8.0
    • Component/s: Distributed Exec
    • Labels:
      None

      Description

      There's a race between Open() and ReportProfile() on the report_thread_active_ member field where, if triggered, ReportProfile() will exit quickly and may never report to the coordinator.

      This causes a problem if the coordinator has failed - the fragment instance will never detect this, and will never cancel itself.

      The code usually won't hit this, because there's a long enough period for the unsynchronised write to become visible, but I started hitting it with high regularity in my test runs.

      PlanFragmentExecutor::Open()
      if (!report_status_cb_.empty() && FLAGS_status_report_interval > 0) {
          unique_lock<mutex> l(report_thread_lock_);
          report_thread_.reset(
              new Thread("plan-fragment-executor", "report-profile",
                  &PlanFragmentExecutor::ReportProfile, this));
          // make sure the thread started up, otherwise ReportProfile() might get into a race
          // with StopReportThread()
          report_thread_started_cv_.wait(l);
          report_thread_active_ = true; /// <<<<<< Set *after* CV fired by ReportProfile()
        }
      
      PlanFragmentExecutor::ReportProfile()
        unique_lock<mutex> l(report_thread_lock_);
        // <etc>
        report_thread_started_cv_.notify_one();
      
        // <etc> - this block yields lock_ and takes long enough for the write
        // to report_thread_active_ to usually become visible
      
        // VVVVVVV -- May execute before Open() sets it
        while (report_thread_active_) {
          //....
        }
        // Exit method
      

        Activity

        Show
        henryr Henry Robinson added a comment - Fixed in https://github.com/apache/incubator-impala/commit/d82411f81c746ec5df2f659e97e6b3ba4472676c

          People

          • Assignee:
            henryr Henry Robinson
            Reporter:
            henryr Henry Robinson
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development