Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4153

count(*) is not working properly on all blank('') columns in RCFile stored tables

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • Impala 2.5.0
    • Impala 2.8.0
    • Backend

    Description

      The aggregate function count is not working properly on those table columns that contain '' (blank) values only. It returns incorrect values seemingly handleing it as NULLs. The issue is reproducible in multiple environments (CDH 5.5.2, 5.7.2) and it seems it is Impala and RCFile specific:

      [morhidi-552-sec-2.gce.cloudera.com:21000] > select * from impala_rcfile_count_issue;
      Query: select * from impala_rcfile_count_issue
      +------+------+------+
      | col1 | col2 | col3 |
      +------+------+------+
      | 1    |      | a    |
      |      |      | b    |
      | 2    |      | c    |
      +------+------+------+
      Fetched 3 row(s) in 0.22s
      [morhidi-552-sec-2.gce.cloudera.com:21000] > select * from impala_rcfile_count_issue where col1='';
      Query: select * from impala_rcfile_count_issue where col1=''
      +------+------+------+
      | col1 | col2 | col3 |
      +------+------+------+
      |      |      | b    |
      +------+------+------+
      Fetched 1 row(s) in 0.12s
      [morhidi-552-sec-2.gce.cloudera.com:21000] > select count(*) from impala_rcfile_count_issue where col1='';
      Query: select count(*) from impala_rcfile_count_issue where col1=''
      +----------+
      | count(*) |
      +----------+
      | 1        |
      +----------+
      Fetched 1 row(s) in 0.67s
      [morhidi-552-sec-2.gce.cloudera.com:21000] > select * from impala_rcfile_count_issue where col2='';
      Query: select * from impala_rcfile_count_issue where col2=''
      +------+------+------+
      | col1 | col2 | col3 |
      +------+------+------+
      | 1    |      | a    |
      |      |      | b    |
      | 2    |      | c    |
      +------+------+------+
      Fetched 3 row(s) in 0.12s
      
      INCORRECT:
      
      [morhidi-552-sec-2.gce.cloudera.com:21000] > select count(*) from impala_rcfile_count_issue where col2='';
      Query: select count(*) from impala_rcfile_count_issue where col2=''
      +----------+
      | count(*) |
      +----------+
      | 0        |
      +----------+
      Fetched 1 row(s) in 0.76s
      
      
      [morhidi-552-sec-2.gce.cloudera.com:21000] > select * from impala_rcfile_count_issue where col1 is NULL;
      Query: select * from impala_rcfile_count_issue where col1 is NULL
      
      Fetched 0 row(s) in 0.22s
      [morhidi-552-sec-2.gce.cloudera.com:21000] > select count(*) from impala_rcfile_count_issue where col1 is NULL;
      Query: select count(*) from impala_rcfile_count_issue where col1 is NULL
      +----------+
      | count(*) |
      +----------+
      | 0        |
      +----------+
      Fetched 1 row(s) in 0.68s
      [morhidi-552-sec-2.gce.cloudera.com:21000] > select * from impala_rcfile_count_issue where col2 is NULL;
      Query: select * from impala_rcfile_count_issue where col2 is NULL
      
      Fetched 0 row(s) in 0.12s
      
      INCORRECT:
      
      [morhidi-552-sec-2.gce.cloudera.com:21000] > select count(*) from impala_rcfile_count_issue where col2 is NULL;
      Query: select count(*) from impala_rcfile_count_issue where col2 is NULL
      +----------+
      | count(*) |
      +----------+
      | 3        |
      +----------+
      Fetched 1 row(s) in 0.64s
      

      Here is the command to create the table from Hive:

      create table impala_rcfile_count_issue stored as rcfile as
      select * from
      (
      select '1' as col1,'' as col2, 'a' as col3
      union all
      select '' as col1,'' as col2, 'b' as col3
      union all
      select '2' as col1,'' as col2, 'c' as col3
      ) t;
      

      Attachments

        Activity

          People

            laszlog Laszlo Gaal
            morhidi Matyas Orhidi
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: