Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-10112

Consider skipping FpRateTooHigh() check for bloom filters

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • Impala 4.0.0
    • Backend
    • ghx-label-9

    Description

      This check disables bloom filters on the sender side.

      It is inaccurate in cases where there are duplicate values of the filter key on the build side. E.g. many-to-many join or a join with multiple keys. This could be fixed with some effort, but is probably not worth it, because:

      • Partition filters are probably still worth evaluating even if there are false positives, because it's cheap and eliminating a partition is still beneficial.
      • Runtime filters are dynamically disabled on the scan side if they are ineffective. I think we still also "evaluate" the always true filter, which is cheaper than doing the hashing and bloom evaluation, but still not entirely free.
      • The disabling is fairly unlikely to kick in for partitioned joins because it's only applied to a small subset of the filter, before the Or() operation.

      So it's potentially harmful and only likely beneficial for broadcast join filters, in which case it saves a small amount of scan CPU and, for global filters, coordinator RPCs and broadcasting. It's unclear that the complexity is worth it for this relatively small and uncertain benefit.

      Attachments

        Issue Links

          Activity

            People

              rizaon Riza Suminto
              tarmstrong Tim Armstrong
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: