Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-11344

Selecting only the missing fields of ORC files should return NULLs

    XMLWordPrintableJSON

Details

    • ghx-label-7

    Description

      While looking into the bug of IMPALA-11296, I found a bug on the same scenario (scanning only the missing columns of ORC files) in current master branch.

      Creating an ORC table with missing fields in the underlying files:

      hive> create external table missing_field_orc (f0 int) stored as orc;
      hive> insert into table missing_field_orc select 1;
      hive> alter table missing_field_orc add columns (f1 int);
      hive> select f1 from missing_field_orc;
      +-------+
      |  f1   |
      +-------+
      | NULL  |
      +-------+
      hive> select f0, f1 from missing_field_orc;
      +-----+-------+
      | f0  |  f1   |
      +-----+-------+
      | 1   | NULL  |
      +-----+-------+
      

      Run the same queries in Impala:

      impala> VERSION;
      Shell version: impala shell build version not available
      Server version: impalad version 4.2.0-SNAPSHOT DEBUG (build 7273cfdfb901b9ef564c2737cf00c7a8abb57f07)
      
      impala> invalidate metadata missing_field_orc;
      impala> select f1 from missing_field_orc;
      ERROR: Parse error in possibly corrupt ORC file: 'hdfs://localhost:20500/test-warehouse/missing_field_orc/000000_0'. No columns found for this scan.
      
      impala> select f0, f1 from missing_field_orc;
      +----+------+
      | f0 | f1   |
      +----+------+
      | 1  | NULL |
      +----+------+
      

      While selecting only the column 'f1', the query failed by an error. It should return NULL.

      Attachments

        Issue Links

          Activity

            People

              tangzhi Zhi Tang
              stigahuang Quanlong Huang
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: