Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-7109

Statistics adds external sort, which spills to disk

    XMLWordPrintableJSON

Details

    Description

      TPCH query 4 with sf 100 runs many times slower. One issue is that an extra external sort has been added, and both external sorts spill to disk.

      Also, the hash join sees 100x more data.

      Here is the query:

      select
        o.o_orderpriority,
        count(*) as order_count
      from
        orders o
      
      where
        o.o_orderdate >= date '1996-10-01'
        and o.o_orderdate < date '1996-10-01' + interval '3' month
        and 
        exists (
          select
            *
          from
            lineitem l
          where
            l.l_orderkey = o.o_orderkey
            and l.l_commitdate < l.l_receiptdate
        )
      group by
        o.o_orderpriority
      order by
        o.o_orderpriority;
      

      Attachments

        Issue Links

          Activity

            People

              gparai Gautam Parai
              rhou Robert Hou
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: