Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-12451

Cardinality underestimation can hurt bloom filter effectiveness

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • Impala 4.2.0
    • None
    • Frontend
    • ghx-label-4

    Description

      Impala planner select desired bloom filter size by estimating the NDV of values and target FPP (currently default at 0.75). Starting from IMPALA-11924, the NDV itself is estimated by taking the min between the input cardinality going to the join builder vs the column's stats NDV.

      If Planner underestimate the input cardinality, it can select bloom filter size that is too small to fit the actual row NDV from the execution, rendering the filter ineffective (has big actual false-positive rate). Example of this case can be observed at RF004 of Q53 from TPC-DS 3TB run with RUNTIME_FILTER_MIN_SIZE=8KB (53.txt).

      To be specific:

      query filter column stats NDV est cardinality selected size actual cardinality ndv based min size
      Q53 RF004 i_item_sk 360000 51 8KB (2^13) 18.53K 128KB (2^17)

      For RF004, the cardinality underestimation can be attributed to bad selectivity estimate in the build hand side of the join node producing that filters. The actual cardinality 18.53K is still within the limit of 8KB bloom filter size, but since the target FPP is 0.75, it still produce high actual false-positive rate, passing out more rows.

      Getting better bloom filter size will require fixing this selectivity estimation, reducing target fpp lower than current default (0.75), or add an optimization to also consider stats NDV if cardinality estimate seems to be severely underestimated. 53_double_filter_size.txt shows that increasing RF004 size can lead to better row filtering.

      Attachments

        1. 53.txt
          841 kB
          Riza Suminto
        2. 53_double_filter_size.txt
          846 kB
          Riza Suminto

        Issue Links

          Activity

            People

              Unassigned Unassigned
              rizaon Riza Suminto
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: