Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-7121

TPCH 4 takes longer when Statistics is disabled.

    XMLWordPrintableJSON

    Details

      Description

      Here is TPCH 4 with sf 100:

      select
        o.o_orderpriority,
        count(*) as order_count
      from
        orders o
      
      where
        o.o_orderdate >= date '1996-10-01'
        and o.o_orderdate < date '1996-10-01' + interval '3' month
        and 
        exists (
          select
            *
          from
            lineitem l
          where
            l.l_orderkey = o.o_orderkey
            and l.l_commitdate < l.l_receiptdate
        )
      group by
        o.o_orderpriority
      order by
        o.o_orderpriority;
      

      The plan has changed when Statistics is disabled. A Hash Agg and a Broadcast Exchange have been added. These two operators expand the number of rows from the lineitem table from 137M to 9B rows. This forces the hash join to use 6GB of memory instead of 30 MB.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                gparai Gautam Parai
                Reporter:
                rhou Robert Hou
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: