Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Duplicate
-
4.0.0
-
None
Description
suppose we run minor compaction 2 times, via alter table
The 2nd request to compaction should have nothing to do but I don't think there is a check for that. It's visible in the context of HIVE-20823, where each compactor run produces a delta with new visibility suffix so we end up with something like
target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands3-1541810844849/warehouse/t/ ├── delete_delta_0000001_0000002_v0000019 │ ├── _orc_acid_version │ └── bucket_00000 ├── delete_delta_0000001_0000002_v0000021 │ ├── _orc_acid_version │ └── bucket_00000 ├── delta_0000001_0000001_0000 │ ├── _orc_acid_version │ └── bucket_00000 ├── delta_0000001_0000002_v0000019 │ ├── _orc_acid_version │ └── bucket_00000 ├── delta_0000001_0000002_v0000021 │ ├── _orc_acid_version │ └── bucket_00000 └── delta_0000002_0000002_0000 ├── _orc_acid_version └── bucket_00000
i.e. 2 deltas with the same write ID range
this is bad. Probably happens today as well but new run produces a delta with the same name and clobbers the previous one, which may interfere with writers
need to investigate
The issue (I think) is that AcidUtils.getAcidState() then returns both deltas as if they were distinct and it effectively duplicates data. There is no data duplication - getAcidState() will not use 2 deltas with the same writeid range
Attachments
Attachments
Issue Links
- is fixed by
-
HIVE-9995 ACID compaction tries to compact a single file
- Closed
- is related to
-
HIVE-20823 Make Compactor run in a transaction
- Closed
-
HIVE-9995 ACID compaction tries to compact a single file
- Closed
-
HIVE-20941 Compactor produces a delete_delta_x_y even if there are no input delete events
- Closed