Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-6871

Enabling runtime filter eliminates more incoming rows than it should.

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.15.0
    • Fix Version/s: 1.16.0
    • Component/s: Execution - Flow
    • Labels:
      None

      Description

      When testing with the following combination on TPC-H dataset (scale factor 100) using a 4 node setup...

      exec.hashjoin.bloom_filter.fpp=0.2
      exec.hashjoin.enable.runtime_filter=true
      exec.hashjoin.runtime_filter.max.waiting.time=20000
      

      It was observed that the filter eliminates more rows than it should.

       

      0: jdbc:drill:schema=dfs.par100> select count(*) from (select * from lineitem l, supplier s where l.l_suppkey = s.s_suppkey and s.s_acctbal <1000);
      +---------+
      | EXPR$0  |
      +---------+
      | 405566  |
      +---------+
      1 row selected (10.565 seconds)
      0: jdbc:drill:schema=dfs.par100> select count(*) from (select * from lineitem l, supplier s where l.l_suppkey = s.s_suppkey and s.s_acctbal <1000);
      +---------+
      | EXPR$0  |
      +---------+
      | 405769  |
      +---------+
      1 row selected (9.845 seconds)
      

      The expected row count for the above (broadcast-join) query should have been 109307880

       

      0: jdbc:drill:schema=dfs.par100> select count(*) from (select * from lineitem l, orders o where o.o_orderkey = l.l_orderkey and o.o_totalprice < 100000);
      +-----------+
      |  EXPR$0   |
      +-----------+
      | 37338355  |
      +-----------+
      1 row selected (44.698 seconds)
      0: jdbc:drill:schema=dfs.par100> select count(*) from (select * from lineitem l, orders o where o.o_orderkey = l.l_orderkey and o.o_totalprice < 100000);
      +-----------+
      |  EXPR$0   |
      +-----------+
      | 38044874  |
      +-----------+
      1 row selected (44.871 seconds)
      

      The expected row count for the above (hash partition-join) query should have been 96176495

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                weijie Weijie Tong
                Reporter:
                kkhatua Kunal Khatua
                Reviewer:
                Sorabh Hamirwasia
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: