Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
None
-
None
-
None
-
None
Description
HIVE-21784 uses a new WriterOptions instead of the field in OrcRecordUpdater:
https://github.com/apache/hive/commit/f62379ba279f41b843fcd5f3d4a107b6fcd04dec#diff-bb969e858664d98848960a801fd58b5cR580-R583
so in this scenario, the overwrite creates an empty bucket file, which is fine as that was the intention of that patch, but it creates that with invalid schema:
CREATE TABLE test_table ( cda_id int, cda_run_id varchar(255), cda_load_ts timestamp, global_party_id string) PARTITIONED BY ( cda_date int, cda_job_name varchar(12)) CLUSTERED BY (cda_id) INTO 2 BUCKETS STORED AS ORC; INSERT OVERWRITE TABLE test_table PARTITION (cda_date = 20200601 , cda_job_name = 'core_base') SELECT 1 as cda_id,'cda_run_id' as cda_run_id, NULL as cda_load_ts, 'global_party_id' global_party_id UNION ALL SELECT 2 as cda_id,'cda_run_id' as cda_run_id, NULL as cda_load_ts, 'global_party_id' global_party_id; ALTER TABLE test_table ADD COLUMNS (group_id string) CASCADE ; INSERT OVERWRITE TABLE test_table PARTITION (cda_date = 20200601 , cda_job_name = 'core_base') SELECT 1 as cda_id,'cda_run_id' as cda_run_id, NULL as cda_load_ts, 'global_party_id' global_party_id, 'group_id' as group_id;
because of HIVE-21784, the new empty bucket_00000 shows this schema in orc dump:
Type: struct<_col0:int,_col1:varchar(255),_col2:timestamp,_col3:string,_col4:string>
instead of:
Type: struct<operation:int,originalTransaction:bigint,bucket:int,rowId:bigint,currentTransaction:bigint,row:struct<cda_id:int,cda_run_id:varchar(255),cda_load_ts:timestamp,global_party_id:string,group_id:string>>
and this could lead to problems later, when hive tries to look into the file during split generation
Attachments
Attachments
Issue Links
- fixes
-
HIVE-23758 OrcInputFormat.getSargColumnNames might be more failsafe in case of schema mismatch
- Resolved
- is caused by
-
HIVE-21784 Insert overwrite on an acid (not mm) table is ineffective if the input is empty
- Closed
- Is contained by
-
HIVE-26751 Bug Fixes and Improvements for 3.2.0 release
- Open
- is part of
-
HIVE-22538 RS deduplication does not always enforce hive.optimize.reducededuplication.min.reducer
- Closed