Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-6226

Analytic planner misses some opportunities to merge sort groups

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: Impala 2.10.0
    • Fix Version/s: None
    • Component/s: Frontend
    • Labels:
      None
    • Docs Text:
      Workaround: rewrite analytic functions where possible to use the same sort order.
    • Target Version:
    • Epic Color:
      ghx-label-6

      Description

      create table testrn (g int, v int, r int);
      insert into testrn (g,v,r) select 1,1,1 union select 1,2,2 union select 2,1,3 union select 2,2,4;
      select g,first_value(v) over(partition by g order by r),row_number() over (partition by g order by r desc) from testrn;
      

      The above analytical function query produces the following plan:

      Max Per-Host Resource Reservation: Memory=20.00MB
      Per-Host Resource Estimates: Memory=52.00MB
      WARNING: The following tables are missing relevant table and/or column statistics.
      default.testrn
      
      F02:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
      |  Per-Host Resources: mem-estimate=0B mem-reservation=0B
      PLAN-ROOT SINK
      |  mem-estimate=0B mem-reservation=0B
      |
      06:EXCHANGE [UNPARTITIONED]
      |  mem-estimate=0B mem-reservation=0B
      |  tuple-ids=6,3 row-size=24B cardinality=unavailable
      |
      F01:PLAN FRAGMENT [HASH(g)] hosts=1 instances=1
      Per-Host Resources: mem-estimate=20.00MB mem-reservation=20.00MB
      04:ANALYTIC
      |  functions: row_number()
      |  partition by: g
      |  order by: r DESC
      |  window: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
      |  mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB
      |  tuple-ids=6,3 row-size=24B cardinality=unavailable
      |
      03:SORT
      |  order by: g ASC NULLS FIRST, r DESC
      |  mem-estimate=6.00MB mem-reservation=6.00MB spill-buffer=2.00MB
      |  tuple-ids=6 row-size=16B cardinality=unavailable
      |
      02:ANALYTIC
      |  functions: first_value(v)
      |  partition by: g
      |  order by: r ASC
      |  window: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
      |  mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB
      |  tuple-ids=4,2 row-size=16B cardinality=unavailable
      |
      01:SORT
      |  order by: g ASC NULLS FIRST, r ASC
      |  mem-estimate=6.00MB mem-reservation=6.00MB spill-buffer=2.00MB
      |  tuple-ids=4 row-size=12B cardinality=unavailable
      |
      05:EXCHANGE [HASH(g)]
      |  mem-estimate=0B mem-reservation=0B
      |  tuple-ids=0 row-size=12B cardinality=unavailable
      |
      F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
      Per-Host Resources: mem-estimate=32.00MB mem-reservation=0B
      00:SCAN HDFS [default.testrn, RANDOM]
         partitions=1/1 files=1 size=24B
         stats-rows=unavailable extrapolated-rows=disabled
         table stats: rows=unavailable size=unavailable
         column stats: unavailable
         mem-estimate=32.00MB mem-reservation=0B
         tuple-ids=0 row-size=12B cardinality=unavailable
      

      I believe we could avoid sorting twice by rewriting first_value() into last_value(). I suspect there are a few other cases where the analytic functions can be "flipped" like this to handle a different sort order.

      The solution probably involves adding some extra cases to this logic: https://github.com/apache/incubator-impala/blob/5e9b4e2/fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java#L729 to detect when the expressions are identical up to the ASC/DESC and when the function can be "flipped" to produce the correct result.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                tarmstrong Tim Armstrong
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated: