Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-12357

Skip scheduling runtime filter from PK-FK join with full build scan

    XMLWordPrintableJSON

Details

    • ghx-label-6

    Description

      PK-FK inner join between a dimension table and a fact table is a common occurrence in a query. It is also often that such join does not involve any predicate filter in the dimension table. Thus, runtime filter values coming from this kind of dimension table scan (PK) is likely inclusive to all values of the fact table column (FK). It is ineffective to generate this filter because this filter is unlikely to reject any rows.

      Attached screenshot shows visualization of RF 50, 52, 60, and 62 targeting 49:SCAN from TPC-DS Q64. These runtime filters coming from full dimension table scan on PK-FK join. In theory, these filters should not reject any probe rows. The query profile, however, shows that these filters can still reject some probe rows with NULL values in their target column. Unfortunately, due to the low number of NULL vs non-NULL, all of those filters still ended up disabled by scanners because the 49:SCAN deemed them ineffective.

      We can skip generating runtime filters that match all these criteria:

      1. Build side is full table scan
      2. No runtime filter targeting the build scan
      3. There is a PK-FK constraint between the runtime filter origin column in the build side and the target column in the probe side.

      If PK-FK constraint is not declared in table schema, which happen most of the time, criteria 3 can be replaced by checking the runtime filter’s false positive probability (eliminate one with high false positive probability).

      Attachments

        Activity

          People

            rizaon Riza Suminto
            rizaon Riza Suminto
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: