Uploaded image for project: 'Apache Drill'
  1. Apache Drill
  2. DRILL-3087

Union All returns incorrect results.

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 1.0.0
    • 1.0.0
    • None
    • 21cc578b6b8c8f3ca1ebffd3dbb92e35d68bc726

    Description

      Union All returns incorrect results when the same aggregate + group by query is used on both sides of Union All. Test was run from sqlline on 4 node cluster on CentOS.

      Total number of records in the parquet file are,

      0: jdbc:drill:schema=dfs.tmp> select count(*) from `twokeyParq_02/0_0_0.parquet`;
      +------------+
      |   EXPR$0   |
      +------------+
      | 26212355   |
      +------------+
      1 row selected (0.17 seconds)
      

      Results returned by aggregate + group by query.

      0: jdbc:drill:schema=dfs.tmp> select max(cast(key1 as double)) max_key1, key2 from `twokeyParq_02/0_0_0.parquet` group by key2;
      +------------+------------+
      |  max_key1  |    key2    |
      +------------+------------+
      | 1.42931543226E9 | m          |
      | 1.42931281008E9 | e          |
      | 1.42931334853E9 | k          |
      | 1.42931595791E9 | n          |
      | 1.42435032101E9 | 1          |
      | 1.42931539809E9 | l          |
      | 1.42931336919E9 | c          |
      | 1.40095133379E9 | 0          |
      | 1.42931553374E9 | h          |
      | 1.42931539751E9 | j          |
      | 1.42931480081E9 | f          |
      | 1.42931347924E9 | b          |
      | 1.42931509068E9 | g          |
      | 1.42931380603E9 | d          |
      | 1.42931626355E9 | a          |
      | 1.42931450347E9 | i          |
      +------------+------------+
      16 rows selected (19.262 seconds)
      

      There Union All results are incorrect.

      0: jdbc:drill:schema=dfs.tmp> select * from (select max(cast(key1 as double)) max_key1, key2 from `twokeyParq_02/0_0_0.parquet` group by key2 union all select count(cast(key1 as double)) max_key1, key2 from `twokeyParq_02/0_0_0.parquet` group by key2);
      +------------+------------+
      |  max_key1  |    key2    |
      +------------+------------+
      | 1.42931380603E9 | d          |
      | 1.42931336919E9 | c          |
      | 1.42931347924E9 | b          |
      | 1.42435032101E9 | 1          |
      | 1.42931626355E9 | a          |
      | 1.40095133379E9 | 0          |
      | 1.42931334853E9 | k          |
      | 1.42931543226E9 | m          |
      | 1.42931539751E9 | j          |
      | 1.42931553374E9 | h          |
      | 1.42931281008E9 | e          |
      | 1.42931595791E9 | n          |
      | 1.42931509068E9 | g          |
      | 1.42931480081E9 | f          |
      | 1.42931539809E9 | l          |
      | 1.42931450347E9 | i          |
      | 1870700.0  | d          |
      | 1871853.0  | c          |
      | 1870682.0  | b          |
      | 180.0      | 1          |
      | 1872286.0  | a          |
      | 156.0      | 0          |
      | 1873931.0  | k          |
      | 1874177.0  | m          |
      | 1872252.0  | j          |
      | 1872734.0  | h          |
      | 1871510.0  | e          |
      | 1874465.0  | n          |
      | 1873607.0  | g          |
      | 1870453.0  | f          |
      | 1870933.0  | l          |
      | 1872436.0  | i          |
      +------------+------------+
      32 rows selected (39.183 seconds)
      

      Physical query plan

      0: jdbc:drill:schema=dfs.tmp> explain plan for select * from (select max(cast(key1 as double)) max_key1, key2 from `twokeyParq_02/0_0_0.parquet` group by key2 union all select count(cast(key1 as double)) max_key1, key2 from `twokeyParq_02/0_0_0.parquet` group by key2);
      +------------+------------+
      |    text    |    json    |
      +------------+------------+
      | 00-00    Screen
      00-01      UnionAll(all=[true])
      00-03        Project(max_key1=[$1], key2=[$0])
      00-05          HashAgg(group=[{0}], max_key1=[MAX($1)])
      00-07            HashAgg(group=[{0}], max_key1=[MAX($1)])
      00-09              Project(key2=[$0], $f1=[CAST($1):DOUBLE])
      00-11                Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/tmp/twokeyParq_02/0_0_0.parquet]], selectionRoot=/tmp/twokeyParq_02/0_0_0.parquet, numFiles=1, columns=[`key2`, `key1`]]])
      00-02        Project(max_key1=[$1], key2=[$0])
      00-04          HashAgg(group=[{0}], max_key1=[$SUM0($1)])
      00-06            HashAgg(group=[{0}], max_key1=[COUNT($1)])
      00-08              Project(key2=[$0], $f1=[CAST($1):DOUBLE])
      00-10                Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/tmp/twokeyParq_02/0_0_0.parquet]], selectionRoot=/tmp/twokeyParq_02/0_0_0.parquet, numFiles=1, columns=[`key2`, `key1`]]])
      

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            jni Jinfeng Ni
            khfaraaz Khurram Faraaz
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment