Details
-
Bug
-
Status: Closed
-
Blocker
-
Resolution: Fixed
-
0.14.0
-
None
Description
create table T(a int, b int) clustered by (a) into 2 buckets stored as orc TBLPROPERTIES('transactional'='false') insert into T(a,b) values(1,2) insert into T(a,b) values(1,3) alter table T SET TBLPROPERTIES ('transactional'='true')
//we should now have bucket files 000001_0 and 000001_0_copy_1
but OrcRawRecordMerger.OriginalReaderPair.next() doesn't know that there can be copy_N files and numbers rows in each bucket from 0 thus generating duplicate IDs
select ROW__ID, INPUT__FILE__NAME, a, b from T
produces
{"transactionid":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/000001_0,1,2 {"transactionid\":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/000001_0_copy_1,1,3
owen.omalley, do you have any thoughts on a good way to handle this?
attached patch has a few changes to make Acid even recognize copy_N but this is just a pre-requisite. The new UT demonstrates the issue.
Futhermore,
alter table T compact 'major' select ROW__ID, INPUT__FILE__NAME, a, b from T order by b
produces
{"transactionid":0,"bucketid":1,"rowid":0} file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands....warehouse/nonacidorctbl/base_-9223372036854775808/bucket_00001 1 2
HIVE-16177.04.patch has TestTxnCommands.testNonAcidToAcidConversion0() demonstrating this
This is because compactor doesn't handle copy_N files either (skips them)
Attachments
Attachments
Issue Links
- blocks
-
HIVE-17069 Refactor OrcRawRecrodMerger.ReaderPair
- Closed
- is related to
-
HIVE-17526 Disable conversion to ACID if table has _copy_N files on branch-1
- Resolved
-
HIVE-16732 Transactional tables should block LOAD DATA
- Closed
- relates to
-
HIVE-15899 Make CTAS with acid target table and insert into acid_tbl select ... union all ... work
- Closed
-
HIVE-12724 ACID: Major compaction fails to include the original bucket files into MR job
- Closed
-
HIVE-13961 ACID: Major compaction fails to include the original bucket files if there's no delta directory
- Closed
-
HIVE-14366 Conversion of a Non-ACID table to an ACID table produces non-unique primary keys
- Closed
-
HIVE-11525 Bucket pruning
- Closed
- requires
-
HIVE-16964 _orc_acid_version file is missing
- Closed
- links to