Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4383

Plan fragments may never send reports to coordinator

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • Impala 2.7.0
    • Impala 2.8.0
    • Distributed Exec
    • None

    Description

      There's a race between Open() and ReportProfile() on the report_thread_active_ member field where, if triggered, ReportProfile() will exit quickly and may never report to the coordinator.

      This causes a problem if the coordinator has failed - the fragment instance will never detect this, and will never cancel itself.

      The code usually won't hit this, because there's a long enough period for the unsynchronised write to become visible, but I started hitting it with high regularity in my test runs.

      PlanFragmentExecutor::Open()
      if (!report_status_cb_.empty() && FLAGS_status_report_interval > 0) {
          unique_lock<mutex> l(report_thread_lock_);
          report_thread_.reset(
              new Thread("plan-fragment-executor", "report-profile",
                  &PlanFragmentExecutor::ReportProfile, this));
          // make sure the thread started up, otherwise ReportProfile() might get into a race
          // with StopReportThread()
          report_thread_started_cv_.wait(l);
          report_thread_active_ = true; /// <<<<<< Set *after* CV fired by ReportProfile()
        }
      
      PlanFragmentExecutor::ReportProfile()
        unique_lock<mutex> l(report_thread_lock_);
        // <etc>
        report_thread_started_cv_.notify_one();
      
        // <etc> - this block yields lock_ and takes long enough for the write
        // to report_thread_active_ to usually become visible
      
        // VVVVVVV -- May execute before Open() sets it
        while (report_thread_active_) {
          //....
        }
        // Exit method
      

      Attachments

        Activity

          People

            henryr Henry Robinson
            henryr Henry Robinson
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: