Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-6266

Runtime filters should not have non-deterministic expression on consumer side

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • Impala 2.8.0, Impala 2.9.0, Impala 2.10.0, Impala 2.11.0
    • None
    • Frontend

    Description

      Random expressions on the consumer side of runtime filters are evaluated independently from the "final" join, which gives +1 chance for rows to be dropped. This means that the same query can return less or different rows if the runtime fiiter was used than if not.

      Example:

      use tpch_parquet;
      
      set DISABLE_ROW_RUNTIME_FILTERING=0;
      select count(*) from supplier join nation on s_nationkey + cast(rand()*2 as int) = n_nationkey;
      result: 9722
      
      set DISABLE_ROW_RUNTIME_FILTERING=1;
      select count(*) from supplier join nation on s_nationkey + cast(rand()*2 as int) = n_nationkey;
      result: 9803
      

      ( rand() is pseudo-random, so running the same query without changing to query option always returns the same result)

      Optimizations like runtime filters should have no effect on the results, even in case of non-deterministic expressions.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              csringhofer Csaba Ringhofer
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: