Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-9850

Avoid doing expensive join at the coordinator

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • Impala 3.4.0
    • None
    • Frontend
    • None
    • ghx-label-10

    Description

      When we join 2 subqueries each of which has its root fragment executing on the coordinator, currently the join is also executed on the coordinator. Here's an example query:

      select count(*) from 
        ( select rank() over (order by l_quantity) x from lineitem) dt1 
           inner join 
        (select rank() over (order by l_shipdate) y from lineitem) dt2
           on dt1.x = dt2.y
      

      This can result in poor performance since the join inputs can be quite large. Ideally, we want to re-distribute both intermediate results after the rank() has been computed and do the join on executor nodes.

      Another similar scenario is joining of subqueries which have ORDER BY LIMIT.

      Attachments

        Issue Links

          Activity

            People

              amansinha Aman Sinha
              amansinha Aman Sinha
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: