Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-6266

Runtime filters should not have non-deterministic expression on consumer side

    Details

    • Type: Bug
    • Status: Open
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: Impala 2.8.0, Impala 2.9.0, Impala 2.10.0, Impala 2.11.0
    • Fix Version/s: None
    • Component/s: Frontend
    • Epic Color:
      ghx-label-9

      Description

      Random expressions on the consumer side of runtime filters are evaluated independently from the "final" join, which gives +1 chance for rows to be dropped. This means that the same query can return less or different rows if the runtime fiiter was used than if not.

      Example:

      use tpch_parquet;
      
      set DISABLE_ROW_RUNTIME_FILTERING=0;
      select count(*) from supplier join nation on s_nationkey + cast(rand()*2 as int) = n_nationkey;
      result: 9722
      
      set DISABLE_ROW_RUNTIME_FILTERING=1;
      select count(*) from supplier join nation on s_nationkey + cast(rand()*2 as int) = n_nationkey;
      result: 9803
      

      ( rand() is pseudo-random, so running the same query without changing to query option always returns the same result)

      Optimizations like runtime filters should have no effect on the results, even in case of non-deterministic expressions.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                csringhofer Csaba Ringhofer
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated: