Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-3689

incorrect results : aggregate AVG returns wrong results over results returned by LEAD function.

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      Aggregate AVG returns wrong results over results returned by LEAD function.

      results returned by Drill

      0: jdbc:drill:schema=dfs.tmp> SELECT  avg(lead_col1) FROM (SELECT LEAD(col1) OVER(PARTITION BY col7 ORDER BY col1) lead_col1 , col7 FROM FEWRWSPQQ_101) sub_query;
      +-------------------------+
      |         EXPR$0          |
      +-------------------------+
      | 2.35195986941647008E17  |
      +-------------------------+
      1 row selected (0.264 seconds)
      

      Explain plan for above query from Drill

      0: jdbc:drill:schema=dfs.tmp> explain plan for SELECT  avg(lead_col1) FROM (SELECT LEAD(col1) OVER(PARTITION BY col7 ORDER BY col1) lead_col1 , col7 FROM FEWRWSPQQ_101) sub_query;
      +------+------+
      | text | json |
      +------+------+
      | 00-00    Screen
      00-01      Project(EXPR$0=[$0])
      00-02        Project(EXPR$0=[CAST(/(CastHigh(CASE(=($1, 0), null, $0)), $1)):ANY NOT NULL])
      00-03          StreamAgg(group=[{}], agg#0=[$SUM0($0)], agg#1=[COUNT($0)])
      00-04            Project(w0$o0=[$3])
      00-05              Window(window#0=[window(partition {2} order by [1] range between UNBOUNDED PRECEDING and CURRENT ROW aggs [LEAD($1)])])
      00-06                SelectionVectorRemover
      00-07                  Sort(sort0=[$2], sort1=[$1], dir0=[ASC], dir1=[ASC])
      00-08                    Project(T36¦¦*=[$0], col1=[$1], col7=[$2])
      00-09                      Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///tmp/FEWRWSPQQ_101]], selectionRoot=maprfs:/tmp/FEWRWSPQQ_101, numFiles=1, columns=[`*`]]])
      

      results returned by Postgres

      postgres=# SELECT  avg(lead_col1) FROM (SELECT LEAD(col1) OVER(PARTITION BY col7 ORDER BY col1) lead_col1 , col7 FROM FEWRWSPQQ_101) sub_query;
               avg         
      ---------------------
       1157533190627124568
      (1 row)
      

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            khfaraaz Khurram Faraaz
            khfaraaz Khurram Faraaz
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment