Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-9132 CBO: Calcite Operator To Hive Operator (Calcite Return Path)
  3. HIVE-28254

CBO (Calcite Return Path): Multiple DISTINCT leads to wrong results

    XMLWordPrintableJSON

Details

    Description

      CBO return path can build incorrect GroupByOperator when multiple aggregations with DISTINCT are involved.

      This is an example.

      CREATE TABLE test (col1 INT, col2 INT);
      INSERT INTO test VALUES (1, 100), (2, 200), (2, 200), (3, 300);
      
      set hive.cbo.returnpath.hiveop=true;
      set hive.map.aggr=false;
      
      SELECT
        SUM(DISTINCT col1),
        COUNT(DISTINCT col1),
        SUM(DISTINCT col2),
        SUM(col2)
      FROM test;

      The last column should be 800. But the SUM refers to col1 and the actual result is 8.

      +------+------+------+------+
      | _c0  | _c1  | _c2  | _c3  |
      +------+------+------+------+
      | 6    | 3    | 600  | 8    |
      +------+------+------+------+ 

       

      Attachments

        Issue Links

          Activity

            People

              okumin Shohei Okumiya
              okumin Shohei Okumiya
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: