Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-4894

Impala changes table schema when you do ALTER TABLE .. ADD COLUMNS but not partition schema resulting in false NULL values in Hive.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • Impala 2.2.4
    • None
    • Catalog
    • None

    Description

      Adding a column in IMPALA for a partitioned table causes Hive to display "NULL" values for the newly added column.

      == Reproduction Details ==

      <<< 1. Setup using impala.

      – Impala Setup –

      create table parttest (a int) partitioned by (c int);
      insert into table parttest partition (c=1) values (1);
      insert into table parttest partition (c=2) values (2);
      alter table parttest add columns (b int);
      insert into table parttest partition (c=1) values (3,3);
      

      <<< 2. Query using impala and hive to see the different results.

      – Query Results –

      *Note that Hive reports value of "b" as NULL for "C=1" whereas Impala correctly reports it as 3.
      *

      • Impala -
      [jrepo7-2:21000] > select * from parttest;
      Query: select * from parttest
      +---+------+---+
      | a | b    | c |
      +---+------+---+
      | 3 | 3    | 1 |
      | 2 | NULL | 2 |
      | 1 | NULL | 1 |
      +---+------+---+
      
      • Hive -
      hive> select * from parttest;
      OK
      1  NULL  1
      3  NULL  1
      2  NULL  2
      
      

      <<< 3. Do the following describes in hive to see the metadata difference.

      hive> desc parttest;
      OK
      a                     int
      b                     int
      c                     int
      
      # Partition Information
      # col_name              data_type             comment
      
      c                     int
      
      • Describe partition
      hive> desc parttest partition (c=1);
      OK
      a                     int
      c                     int
      
      # Partition Information
      # col_name              data_type             comment
      
      c                     int
      

      <<< 3.a. Alternatively you can do the following in the metastore database to see the difference.

      select tbl.TBL_NAME, tbl.TBL_ID, tbl.SD_ID table_sd_id, col.CD_ID, col.COLUMN_NAME from TBLS tbl join SDS sd on tbl.SD_ID = sd.SD_ID join COLUMNS_V2 col on sd.CD_ID = col.CD_ID where tbl.TBL_NAME = "parttest";
      
      select part.SD_ID partition_sd_id, part.PART_NAME, col.COLUMN_NAME from PARTITIONS part join TBLS tbl on part.TBL_ID = tbl.TBL_ID join SDS sd on part.SD_ID = sd.SD_ID join COLUMNS_V2 col on col.CD_ID = sd.CD_ID where tbl.TBL_NAME = "parttest";
      

      Example:

      mysql> select tbl.TBL_NAME, tbl.TBL_ID, tbl.SD_ID table_sd_id, col.CD_ID, col.COLUMN_NAME from TBLS tbl join SDS sd on tbl.SD_ID = sd.SD_ID join COLUMNS_V2 col on sd.CD_ID = col.CD_ID where tbl.TBL_NAME = "parttest";
      +----------+--------+-------------+-------+-------------+
      | TBL_NAME | TBL_ID | table_sd_id | CD_ID | COLUMN_NAME |
      +----------+--------+-------------+-------+-------------+
      | parttest |   3775 |       11273 |  3781 | a           |
      | parttest |   3775 |       11273 |  3781 | b           |
      +----------+--------+-------------+-------+-------------+
      2 rows in set (0.01 sec)
      
      mysql>
      mysql> select part.SD_ID partition_sd_id, part.PART_NAME, col.COLUMN_NAME from PARTITIONS part join TBLS tbl on part.TBL_ID = tbl.TBL_ID join SDS sd on part.SD_ID = sd.SD_ID join COLUMNS_V2 col on col.CD_ID = sd.CD_ID where tbl.TBL_NAME = "parttest";
      +-----------------+-----------+-------------+
      | partition_sd_id | PART_NAME | COLUMN_NAME |
      +-----------------+-----------+-------------+
      |           11274 | c=1       | a           |
      |           11275 | c=2       | a           |
      +-----------------+-----------+-------------+
      2 rows in set (0.00 sec)
      

      Workarounds:

      • ALTER TABLE ... CASCADE in hive instead of impala which will update metadata for table and push to partitions.
      • COMPUTE STATS INCREMENTAL in Impala for affected partitions to update metadata.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              SudarshanS Sudarshan
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: