Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4153

count(*) is not working properly on all blank('') columns in RCFile stored tables

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Impala 2.5.0
    • Fix Version/s: Impala 2.8.0
    • Component/s: Backend
    • Labels:

      Description

      The aggregate function count is not working properly on those table columns that contain '' (blank) values only. It returns incorrect values seemingly handleing it as NULLs. The issue is reproducible in multiple environments (CDH 5.5.2, 5.7.2) and it seems it is Impala and RCFile specific:

      [morhidi-552-sec-2.gce.cloudera.com:21000] > select * from impala_rcfile_count_issue;
      Query: select * from impala_rcfile_count_issue
      +------+------+------+
      | col1 | col2 | col3 |
      +------+------+------+
      | 1    |      | a    |
      |      |      | b    |
      | 2    |      | c    |
      +------+------+------+
      Fetched 3 row(s) in 0.22s
      [morhidi-552-sec-2.gce.cloudera.com:21000] > select * from impala_rcfile_count_issue where col1='';
      Query: select * from impala_rcfile_count_issue where col1=''
      +------+------+------+
      | col1 | col2 | col3 |
      +------+------+------+
      |      |      | b    |
      +------+------+------+
      Fetched 1 row(s) in 0.12s
      [morhidi-552-sec-2.gce.cloudera.com:21000] > select count(*) from impala_rcfile_count_issue where col1='';
      Query: select count(*) from impala_rcfile_count_issue where col1=''
      +----------+
      | count(*) |
      +----------+
      | 1        |
      +----------+
      Fetched 1 row(s) in 0.67s
      [morhidi-552-sec-2.gce.cloudera.com:21000] > select * from impala_rcfile_count_issue where col2='';
      Query: select * from impala_rcfile_count_issue where col2=''
      +------+------+------+
      | col1 | col2 | col3 |
      +------+------+------+
      | 1    |      | a    |
      |      |      | b    |
      | 2    |      | c    |
      +------+------+------+
      Fetched 3 row(s) in 0.12s
      
      INCORRECT:
      
      [morhidi-552-sec-2.gce.cloudera.com:21000] > select count(*) from impala_rcfile_count_issue where col2='';
      Query: select count(*) from impala_rcfile_count_issue where col2=''
      +----------+
      | count(*) |
      +----------+
      | 0        |
      +----------+
      Fetched 1 row(s) in 0.76s
      
      
      [morhidi-552-sec-2.gce.cloudera.com:21000] > select * from impala_rcfile_count_issue where col1 is NULL;
      Query: select * from impala_rcfile_count_issue where col1 is NULL
      
      Fetched 0 row(s) in 0.22s
      [morhidi-552-sec-2.gce.cloudera.com:21000] > select count(*) from impala_rcfile_count_issue where col1 is NULL;
      Query: select count(*) from impala_rcfile_count_issue where col1 is NULL
      +----------+
      | count(*) |
      +----------+
      | 0        |
      +----------+
      Fetched 1 row(s) in 0.68s
      [morhidi-552-sec-2.gce.cloudera.com:21000] > select * from impala_rcfile_count_issue where col2 is NULL;
      Query: select * from impala_rcfile_count_issue where col2 is NULL
      
      Fetched 0 row(s) in 0.12s
      
      INCORRECT:
      
      [morhidi-552-sec-2.gce.cloudera.com:21000] > select count(*) from impala_rcfile_count_issue where col2 is NULL;
      Query: select count(*) from impala_rcfile_count_issue where col2 is NULL
      +----------+
      | count(*) |
      +----------+
      | 3        |
      +----------+
      Fetched 1 row(s) in 0.64s
      

      Here is the command to create the table from Hive:

      create table impala_rcfile_count_issue stored as rcfile as
      select * from
      (
      select '1' as col1,'' as col2, 'a' as col3
      union all
      select '' as col1,'' as col2, 'b' as col3
      union all
      select '2' as col1,'' as col2, 'c' as col3
      ) t;
      

        Attachments

          Activity

            People

            • Assignee:
              laszlog Laszlo Gaal
              Reporter:
              morhidi Matyas Orhidi
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: