Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-6871

Enabling runtime filter eliminates more incoming rows than it should.

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.15.0
    • 1.16.0
    • Execution - Flow
    • None

    Description

      When testing with the following combination on TPC-H dataset (scale factor 100) using a 4 node setup...

      exec.hashjoin.bloom_filter.fpp=0.2
      exec.hashjoin.enable.runtime_filter=true
      exec.hashjoin.runtime_filter.max.waiting.time=20000
      

      It was observed that the filter eliminates more rows than it should.

       

      0: jdbc:drill:schema=dfs.par100> select count(*) from (select * from lineitem l, supplier s where l.l_suppkey = s.s_suppkey and s.s_acctbal <1000);
      +---------+
      | EXPR$0  |
      +---------+
      | 405566  |
      +---------+
      1 row selected (10.565 seconds)
      0: jdbc:drill:schema=dfs.par100> select count(*) from (select * from lineitem l, supplier s where l.l_suppkey = s.s_suppkey and s.s_acctbal <1000);
      +---------+
      | EXPR$0  |
      +---------+
      | 405769  |
      +---------+
      1 row selected (9.845 seconds)
      

      The expected row count for the above (broadcast-join) query should have been 109307880

       

      0: jdbc:drill:schema=dfs.par100> select count(*) from (select * from lineitem l, orders o where o.o_orderkey = l.l_orderkey and o.o_totalprice < 100000);
      +-----------+
      |  EXPR$0   |
      +-----------+
      | 37338355  |
      +-----------+
      1 row selected (44.698 seconds)
      0: jdbc:drill:schema=dfs.par100> select count(*) from (select * from lineitem l, orders o where o.o_orderkey = l.l_orderkey and o.o_totalprice < 100000);
      +-----------+
      |  EXPR$0   |
      +-----------+
      | 38044874  |
      +-----------+
      1 row selected (44.871 seconds)
      

      The expected row count for the above (hash partition-join) query should have been 96176495

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            weijie Weijie Tong
            kkhatua Kunal Khatua
            Sorabh Hamirwasia Sorabh Hamirwasia
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment