Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-5679

Count star optimization gives incorrect result for parquet table partitioned by STRING column

    Details

      Description

      The issue can be reproduced as follows:

      > create table part_tbl_parq (a integer) partitioned by (p STRING) stored as parquet;
      > insert into part_tbl_parq partition(p="val100") values(100);
      > select * from part_tbl_parq;
      +-----+--------+
      | a   | p      |
      +-----+--------+
      | 100 | val100 |
      +-----+--------+
      
      > select p, count(a) from part_tbl_parq group by p;
      +--------+----------+
      | p      | count(a) |
      +--------+----------+
      | val100 | 1        |
      +--------+----------+
      
      > select p, count(*) from part_tbl_parq group by p;
      +---+----------+
      | p | count(*) |
      +---+----------+
      |   | 0        |
      +---+----------+
      

      The result of the last select is obviously incorrect.

      The problem does not happen if the table is partitioned by an INT column:

      > create table part_tbl_parq2 (a integer) partitioned by (p integer) stored as parquet;
      > insert into part_tbl_parq2 partition(p=100) values(100);
      > select p, count(*) from part_tbl_parq2 group by p;
      +-----+----------+
      | p   | count(*) |
      +-----+----------+
      | 100 | 1        |
      +-----+----------+
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                tarasbob Taras Bobrovytsky
                Reporter:
                attilaj Attila Jeges
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: