Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
ghx-label-9
Description
Hive ACID supports row-level DELETE and UPDATE operations on a table. It achieves it via assigning a unique row-id for each row, and maintaining two sets of files in a table. The first set is in the delta directories, they contain the INSERTed rows. The second set of files are in the delete-delta directories, they contain the DELETEd rows.
Note: UPDATE operations are implemented via DELETE+INSERT.
In the filesystem it looks like e.g.:
full_acid/delta_0000001_0000001_0000/0000_0 full_acid/delete_delta_0000002_0000002_0000/0000_0
During scanning we need to return INSERTed rows minus DELETEd rows. One way of doing that is to create an ANTI JOIN between INSERT and DELETE events.