Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-10473

Order by a constant should not be ignored in row_number()

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • Impala 3.1.0, Impala 3.2.0, Impala 3.3.0, Impala 3.4.0
    • Impala 4.0.0
    • Frontend
    • ghx-label-8

    Description

      thundergun found a bug that row_number() ordering by a constant get wrong results when there are more than one fragment instances:

      create table t1(c1 int) stored as textfile;
      -- Insert 3 times to create 3 files
      insert into t1 values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1);
      insert into t1 values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1);
      insert into t1 values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1);
      -- Wrong plan missing a sort node after scan. Analytic is wrongly performed locally.
      set exec_single_node_rows_threshold=0;
      select row_number() over (order by '1') from t1;
      +------------------------+
      | row_number() OVER(...) |
      +------------------------+
      | 1                      |
      | 2                      |
      | 3                      |
      | 4                      |
      | 5                      |
      | 6                      |
      | 7                      |
      | 8                      |
      | 9                      |
      | 10                     |
      | 1                      |
      | 2                      |
      | 3                      |
      | 4                      |
      | 5                      |
      | 6                      |
      | 7                      |
      | 8                      |
      | 9                      |
      | 10                     |
      | 1                      |
      | 2                      |
      | 3                      |
      | 4                      |
      | 5                      |
      | 6                      |
      | 7                      |
      | 8                      |
      | 9                      |
      | 10                     |
      +------------------------+
      

      In the plan, we can find that ANALYTIC is placed in the fragment with SCAN. So row_number() is performed locally, which gets wrong results.

      F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
      |  Per-Host Resources: mem-estimate=16.00KB mem-reservation=0B thread-reservation=1
      PLAN-ROOT SINK
      |  output exprs: row_number()
      |  mem-estimate=0B mem-reservation=0B thread-reservation=0
      |
      02:EXCHANGE [UNPARTITIONED]
      |  mem-estimate=16.00KB mem-reservation=0B thread-reservation=0
      |  tuple-ids=0,2 row-size=8B cardinality=15
      |  in pipelines: 00(GETNEXT)
      |
      F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
      Per-Host Resources: mem-estimate=36.00MB mem-reservation=4.01MB thread-reservation=2
      01:ANALYTIC
      |  functions: row_number()
      |  order by: '1' ASC
      |  window: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
      |  mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0
      |  tuple-ids=0,2 row-size=8B cardinality=15
      |  in pipelines: 00(GETNEXT)
      |
      00:SCAN HDFS [default.t1, RANDOM]
         HDFS partitions=1/1 files=3 size=60B
         stored statistics:
           table: rows=unavailable size=unavailable
           columns: all
         extrapolated-rows=disabled max-scan-range-rows=unavailable
         mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1
         tuple-ids=0 row-size=0B cardinality=15
         in pipelines: 00(GETNEXT) 

      This is an old issue since we have IMPALA-6323 and IMPALA-8069. IMPALA-6323 allows analytic functions to have a constant order by clause and they are always ignored after IMPALA-8069. This causes analytic funcs being performed locally instead of globally and can cause incorrect results for some functions like row_number().

      Attachments

        Issue Links

          Activity

            People

              stigahuang Quanlong Huang
              stigahuang Quanlong Huang
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: