Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-1270

Consider adding distinct aggregation to subqueries as perf optimization

    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • Impala 2.0
    • Impala 4.0.0
    • Frontend

    Description

      We should consider other rewrites for exists. For q4, another rewrite is an inner join + distinct:

      select
        o_orderpriority,
        count(distinct l_orderkey) as order_count
      from lineitem l
      inner join orders o
        on (o.o_orderkey = l.l_orderkey and
            l.l_commitdate < l.l_receiptdate)
      where
        o_orderdate >= '1993-07-01' and
        o_orderdate < '1993-10-01'
      group by
        o_orderpriority
      order by
        o_orderpriority
      

      This can run much faster because we have more flexibility on how we execute the inner join. We get killed partitioning lineitem now.

      Attachments

        Issue Links

          Activity

            People

              tarmstrong Tim Armstrong
              nong_impala_60e1 Nong Li
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: