Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-4469

SUM window query returns incorrect results over integer data

    XMLWordPrintableJSON

Details

    Description

      SUM window query returns incorrect results as compared to Postgres, with or without the frame clause in the window definition. Note that there is a sub query involved and data in column c1 is sorted integer data with no nulls.

      Drill 1.6.0 commit ID: 6d5f4983

      Results from Drill 1.6.0

      0: jdbc:drill:schema=dfs.tmp> SELECT SUM(c1) OVER w FROM (select * from dfs.tmp.`t_alltype`) subQry WINDOW w AS (PARTITION BY c8 ORDER BY c1 RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING);
      +---------+
      | EXPR$0  |
      +---------+
      | 10585   |
      | 10585   |
      | 10585   |
      | 10585   |
      | 10585   |
      | 10585   |
      ...
      | 10585  |
      | 10585  |
      | 10585  |
      +--------+
      145 rows selected (0.257 seconds)
      

      results from Postgres 9.3

      postgres=# SELECT SUM(c1) OVER w FROM (select * from t_alltype) subQry WINDOW w AS (PARTITION BY c8 ORDER BY c1 RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING);
       sum
      ------
       4499
       4499
       4499
       4499
       4499
       4499
      ...
       5613
       5613
       5613
        473
        473
        473
        473
        473
      (145 rows)
      

      Removing the frame clause from window definition, still results in completely different results on Postgres vs Drill

      Results from Drill 1.6.0

      0: jdbc:drill:schema=dfs.tmp>    SELECT SUM(c1) OVER w FROM (select * from t_alltype) subQry WINDOW w AS (PARTITION BY c8 ORDER BY c1);
      +---------+
      | EXPR$0  |
      +---------+
      | 10585   |
      | 10585   |
      | 10585   |
      | 10585   |
      | 10585   |
      | 10585   |
      | 10585   |
      | 10585   |
      | 10585   |
      ...
      | 10585  |
      | 10585  |
      | 10585  |
      | 10585  |
      | 10585  |
      +--------+
      145 rows selected (0.28 seconds)
      

      Results from Postgres

      postgres=# SELECT SUM(c1) OVER w FROM (select * from t_alltype) subQry WINDOW w AS (PARTITION BY c8 ORDER BY c1);
       sum
      ------
          5
         12
         21
         33
         47
         62
         78
         96
        115
        135
        158
        182
        207
        233
        260
        289
      ...
      4914
       5051
       5189
       5328
       5470
       5613
          8
         70
        198
        332
        473
      (145 rows)
      

      Attachments

        1. t_alltype.csv
          18 kB
          Khurram Faraaz
        2. t_alltype.parquet
          12 kB
          Khurram Faraaz

        Issue Links

          Activity

            People

              volodymyr Vova Vysotskyi
              khfaraaz Khurram Faraaz
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: