Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-901

Incorrect result with group by query with null value in group by data

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 1.3
    • Impala 1.3.1
    • None
    • None

    Description

      I've tried this on master, ed4cb660b7a60d9b9248df525c477bab4d218c4b and a nightly c5 cluster

      This problem seems to be data dependent. By changing a single value in the underlying table the results can become correct (or incorrect). Also by changing from a tinyint to an int, the results may be correct, but I'm not sure if the reverse is true.

      The first select query below has incorrect results (missing the row with the null), the other related queries that follow are correct.

      [nightly-2.ent.cloudera.com:21000] > create table foo (col_1 int, col_2 tinyint);
      Query: create table foo (col_1 int, col_2 tinyint)
      
      Returned 0 row(s) in 0.17s
      
      
      [nightly-2.ent.cloudera.com:21000] > insert into foo values (0, -59), (0, null), (0, -4);
      Query: insert into foo values (0, -59), (0, null), (0, -4)
      Inserted 3 rows in 1.28s
      
      
      [nightly-2.ent.cloudera.com:21000] > select col_1, col_2 from foo group by 1, 2;
      Query: select col_1, col_2 from foo group by 1, 2
      +-------+-------+
      | col_1 | col_2 |
      +-------+-------+
      | 0     | -4    |
      | 0     | -59   |
      +-------+-------+
      Returned 2 row(s) in 0.05s
      

      Changing the first value by 1

      [nightly-2.ent.cloudera.com:21000] > drop table foo;
      Query: drop table foo
      
      
      [nightly-2.ent.cloudera.com:21000] > create table foo (col_1 int, col_2 tinyint);
      Query: create table foo (col_1 int, col_2 tinyint)
      
      Returned 0 row(s) in 0.35s
      
      
      [nightly-2.ent.cloudera.com:21000] > insert into foo values (0, -60), (0, null), (0, -4);
      Query: insert into foo values (0, -60), (0, null), (0, -4)
      Inserted 3 rows in 1.28s
      
      
      [nightly-2.ent.cloudera.com:21000] > select col_1, col_2 from foo group by 1, 2;
      Query: select col_1, col_2 from foo group by 1, 2
      +-------+-------+
      | col_1 | col_2 |
      +-------+-------+
      | 0     | -4    |
      | 0     | -60   |
      | 0     | NULL  |
      +-------+-------+
      Returned 3 row(s) in 0.07s
      

      Changing the data type

      [nightly-2.ent.cloudera.com:21000] > drop table foo;
      Query: drop table foo
      
      
      [nightly-2.ent.cloudera.com:21000] > create table foo (col_1 int, col_2 int);
      Query: create table foo (col_1 int, col_2 int)
      
      Returned 0 row(s) in 0.27s
      
      
      [nightly-2.ent.cloudera.com:21000] > insert into foo values (0, -59), (0, null), (0, -4);
      Query: insert into foo values (0, -59), (0, null), (0, -4)
      Inserted 3 rows in 1.60s
      
      
      [nightly-2.ent.cloudera.com:21000] > select col_1, col_2 from foo group by 1, 2;
      Query: select col_1, col_2 from foo group by 1, 2
      +-------+-------+
      | col_1 | col_2 |
      +-------+-------+
      | 0     | -4    |
      | 0     | NULL  |
      | 0     | -59   |
      +-------+-------+
      Returned 3 row(s) in 0.05s
      

      Attachments

        Activity

          People

            henryr Henry Robinson
            caseyc casey
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: