Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4495

Runtime filters are disabled based on stats before they even arrive, contributing to performance cliff on TPC-H Q2

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • Impala 2.7.0, Impala 2.8.0
    • Impala 2.8.0
    • Backend

    Description

      The logic for disabling runtime filters based on stats is faulty. The issue is that the runtime filters are evaluated even before they arrive. This evaluation always returns true, which results in 'considered' being incremented but not 'rejected'. This in turns leads the logic for filter disabling concluding that the filter is ineffective. However, there is no way to know whether the filter is ineffective before it arrives.

      bool HdfsParquetScanner::EvalRuntimeFilters(TupleRow* row) {
        int num_filters = filter_ctxs_.size();
        for (int i = 0; i < num_filters; ++i) {
          LocalFilterStats* stats = &filter_stats_[i];
          if (!stats->enabled) continue;
          const RuntimeFilter* filter = filter_ctxs_[i]->filter;
          // Check filter effectiveness every ROWS_PER_FILTER_SELECTIVITY_CHECK rows.
          // TODO: The stats updates and the filter effectiveness check are executed very
          // frequently. Consider hoisting it out of of this loop, and doing an equivalent
          // check less frequently, e.g., after producing an output batch.
          ++stats->total_possible;
          if (UNLIKELY(
              !(stats->total_possible & (ROWS_PER_FILTER_SELECTIVITY_CHECK - 1)))) {
            double reject_ratio = stats->rejected / static_cast<double>(stats->considered);
            if (filter->AlwaysTrue() ||
                reject_ratio < FLAGS_parquet_min_filter_reject_ratio) {
              stats->enabled = 0;
              continue;
            }
          }
          ++stats->considered;
          void* e = filter_ctxs_[i]->expr->GetValue(row);
          if (!filter->Eval<void>(e, filter_ctxs_[i]->expr->root()->type())) {
            ++stats->rejected;
            return false;
          }
        }
        return true;
      }
      

      I was able to reproduce this easily on TPC-H Q2 with scale factor 20 and runtime_filter_arrival_wait_time_ms=1 . I added logging to prove that the filters were being disabled before they arrived:

      diff --git a/be/src/exec/hdfs-parquet-scanner.cc b/be/src/exec/hdfs-parquet-scanner.cc
      index 6b157aa..d1dfd92 100644
      --- a/be/src/exec/hdfs-parquet-scanner.cc
      +++ b/be/src/exec/hdfs-parquet-scanner.cc
      @@ -676,6 +676,8 @@ bool HdfsParquetScanner::EvalRuntimeFilters(TupleRow* row) {
             double reject_ratio = stats->rejected / static_cast<double>(stats->considered);
             if (filter->AlwaysTrue() ||
                 reject_ratio < FLAGS_parquet_min_filter_reject_ratio) {
      +        LOG(INFO) << "Disabling filter " << filter->id() << " HasBloomFilter " << filter->HasBloomFilter()
      +                  << " rejected " << stats->rejected << " considered " << stats->considered;
               stats->enabled = 0;
               continue;
             }
      
      I1116 09:42:14.286149 26839 hdfs-parquet-scanner.cc:679] Disabling filter 3 HasBloomFilter 0 rejected 0 considered 16383
      I1116 09:42:15.449862 26890 hdfs-parquet-scanner.cc:679] Disabling filter 0 HasBloomFilter 0 rejected 0 considered 16383
      I1116 09:42:15.449887 26890 hdfs-parquet-scanner.cc:679] Disabling filter 4 HasBloomFilter 1 rejected 0 considered 16383
      I1116 09:42:15.473232 26891 hdfs-parquet-scanner.cc:679] Disabling filter 0 HasBloomFilter 0 rejected 0 considered 16383
      I1116 09:42:15.473254 26891 hdfs-parquet-scanner.cc:679] Disabling filter 4 HasBloomFilter 1 rejected 0 considered 16383
      I1116 09:42:20.298415 26900 hdfs-parquet-scanner.cc:679] Disabling filter 6 HasBloomFilter 1 rejected 0 considered 16383
      

      Attachments

        1. TPC-DS-Q59.txt
          847 kB
          Mostafa Mokhtar

        Activity

          People

            kwho Michael Ho
            tarmstrong Tim Armstrong
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: