Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-2178

Impala returns incorrect value when group by multiple CHAR column

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: Impala 2.1, Impala 2.2
    • Fix Version/s: Impala 2.2.7, Impala 2.3.0
    • Component/s: None
    • Labels:

      Description

      When group by contains multiple CHAR columns which have size <=2, Impala returns incorrect values for some columns.
      Step to reproduce:
      1. create a data file and put it into hdfs

      10067481,A,0000,1A,3001,T,9,74.01
      10067481,C,0020,0A,1001, ,4,834.01
      10067488,I,0000,0A,1001,P,1,234.01
      10067488, ,0002,0B,1001,S,5,34.01
      9006748,S,0000,0A,1001, ,8,234.01
      10067488,F,0100,2A,1000,P,1,45234.01
      10067481,C,0020,0A,1001, ,0,2034.01
      10067483,I,0000,0A,1001,P,1,234.01
      10067483, ,0000,0A,1001,P,3,34.01
      9006748,I,0000,0A,1001,P,6,24.01
      10067483,G,0000,0A,1001,P,5,1234.01
      10067484,I,0000,0A,1001,P,6,24.01
      9006748,I,0000,0A,1001,P,4,74.01
      10067484,S,0400,0A,1001,P,8,234.01
      10067489,A,0400,1A,3001,T,9,74.01
      9006748,T,0040,0A,1001,P,2,1134.01
      10067489,C,0020,0A,1001, ,4,834.01
      10067483,I,0000,0A,1005, ,4,4.01
      10067483,G,0000,0A,1005, ,9,24.00
      10067484,T,0300,0A,1001,P,2,1134.01
      10067484,I,0300,0A,1005,P,4,74.01
      10067489,C,0020,0A,1001, ,0,2034.01
      10067488,F,0100,2A,1001,P,6,278.1
      

      2. create table that contains multiple char columns by using this data file.

      create external table test_groupby (
        col_id  int,
        col_char1  CHAR(1),
        col_char4 CHAR(4),
        col_char2       CHAR(2),
        col2_char4 CHAR(4),
        col2_char1  CHAR(1),
        col_decimal1   DECIMAL(1,0),
        col_decimal2  DECIMAL(14,2)
        )
      ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
      location '/test-warehouse/test_groupby'
      

      3. run a group by query. the "col_id" in resultset doesn't exist in original data. however, the row count and rest columns contain right value.

      Query: select col_id,
      col_char2,
      col2_char1,
      Sum(col_decimal1)     sum1,
      Sum(col_decimal2)     sum2
      FROM  test_groupby_txt
      GROUP  BY col_id,
      col_char1,
      col_char2,
      col2_char1
      +-----------------+---------+------------------+-------------+----------------+
      | col_id            | col_char2 | col2_char1 | sum1 | sum2|
      +-----------------+---------+------------------+-------------+----------------+
      | 10067521        | 0A      |                  | 16          | 11472.08       |
      | 9006657         | 0A      | P                | 20          | 196.04         |
      | 10067521        | 1A      | T                | 36          | 296.04         |
      | 9006657         | 0A      | P                | 8           | 234.01         |
      | 10067521        | 0A      | P                | 16          | 468.02         |
      | 10067521        | 0A      | P                | 4           | 2268.02        |
      | 10067522        | 0B      | S                | 10          | 68.02          |
      | 10067521        | 2A      | P                | 14          | 91024.22       |
      | 9006657         | 0A      |                  | 8           | 234.01         |
      | 10067521        | 0A      | P                | 6           | 68.02          |
      | 9006657         | 0A      | P                | 4           | 2268.02        |
      | 10067521        | 0A      | P                | 19          | 2492.02        |
      | 10067521        | 0A      |                  | 4           | 4.01           |
      | 10067521        | 0A      | P                | 28          | 1136.09        |
      | 10067521        | 0A      |                  | 9           | 24.00          |
      +-----------------+---------+------------------+-------------+----------------+
      

        Attachments

          Activity

            People

            • Assignee:
              skye Skye Wanderman-Milne
              Reporter:
              jyu@cloudera.com Juan Yu
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: