Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Parquet schema evolution should make it possible to have partitions/tables
backed by files with different schemas. Hive should match the table columns with file columns based on the column name if possible.
However if the serde for a table is missing columns from the serde of a partition Hive fails to match the columns together.
Steps to reproduce:
CREATE TABLE myparquettable_parted ( name string, favnumber int, favcolor string, age int, favpet string ) PARTITIONED BY (day string) STORED AS PARQUET; INSERT OVERWRITE TABLE myparquettable_parted PARTITION(day='2017-04-04') SELECT 'mary' as name, 5 AS favnumber, 'blue' AS favcolor, 35 AS age, 'dog' AS favpet; alter table myparquettable_parted REPLACE COLUMNS ( favnumber int, age int ); <!--- No cascade option, so the partition will not be altered.
SELECT * FROM myparquettable_parted where day='2017-04-04';
will fail with:
java.lang.UnsupportedOperationException: Cannot inspect org.apache.hadoop.io.IntWritable
Hive should either match the columns together or prevent the user from dropping columns from the table.
Attachments
Attachments
Issue Links
- links to