Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
2.2.0
-
None
Description
create table AcidTablePart(a int, b int) partitioned by (p string) clustered by (a) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true'); create temporary table if not exists data1 (x int); insert into data1 values (1); from data1 insert into AcidTablePart partition(p) select 0, 0, 'p' || x insert into AcidTablePart partition(p='p1') select 0, 1
Each branch of this multi-insert create a row in partition p1/bucket0 with ROW__ID=(1,0,0).
The same can happen when running SQL Merge (HIVE-10924) statement that has both Insert and Update clauses when target table has 'transactional'='true','transactional_properties'='default' (see HIVE-14035). This is so because Merge is internally run as a multi-insert statement.
The solution relies on statement ID introduced in HIVE-11030. Each Insert clause of a multi-insert is gets a unique ID.
The ROW__ID.bucketId now becomes a bit packed triplet (format version, bucketId, statementId).
(Since ORC stores field names in the data file we can't rename ROW__ID.bucketId).
This ensures that there are no collisions and retains desired sort properties of ROW__ID.
In particular SortedDynPartitionOptimizer works w/o any changes even in cases where there fewer reducers than buckets.
Attachments
Attachments
Issue Links
- is related to
-
HIVE-17110 BucketCodec should enforce value ranges
- Closed
-
HIVE-18158 Remove OrcRawRecordMerger.ReaderPairAcid.statementId
- Closed
- is required by
-
HIVE-14947 Add support for Acid 2 in Merge
- Closed
- relates to
-
HIVE-14035 Enable predicate pushdown to delta files created by ACID Transactions
- Resolved