[HIVE-5317] Implement insert, update, and delete in Hive with full ACID support - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.14.0
Component/s: Transactions
Labels:
None

Description

Many customers want to be able to insert, update and delete rows from Hive tables with full ACID support. The use cases are varied, but the form of the queries that should be supported are:

INSERT INTO tbl SELECT …
INSERT INTO tbl VALUES ...
UPDATE tbl SET … WHERE …
DELETE FROM tbl WHERE …
MERGE INTO tbl USING src ON … WHEN MATCHED THEN ... WHEN NOT MATCHED THEN ...
SET TRANSACTION LEVEL …
BEGIN/END TRANSACTION

Use Cases

Once an hour, a set of inserts and updates (up to 500k rows) for various dimension tables (eg. customer, inventory, stores) needs to be processed. The dimension tables have primary keys and are typically bucketed and sorted on those keys.
Once a day a small set (up to 100k rows) of records need to be deleted for regulatory compliance.
Once an hour a log of transactions is exported from a RDBS and the fact tables need to be updated (up to 1m rows) to reflect the new data. The transactions are a combination of inserts, updates, and deletes. The table is partitioned and bucketed.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

InsertUpdatesinHive.pdf
19/Sep/13 15:44
186 kB
Owen O'Malley

Issue Links

incorporates

HIVE-5687 Streaming support in Hive

Resolved

is duplicated by

HIVE-4402 Support UPDATE statement

Resolved

is related to

HIVE-8244 INSERT/UPDATE/DELETE should return count of rows affected

Open

HIVE-6905 Implement Auto increment, primary-foreign Key, not null constraints and default value in Hive Table columns

Open

HIVE-9675 Support START TRANSACTION/COMMIT/ROLLBACK commands

Open

relates to

HIVE-10924 add support for MERGE statement

Resolved

supercedes

HIVE-4196 Support for Streaming Partitions in Hive

Resolved

(1 relates to, 1 supercedes)

Sub-Tasks

1.	Define API for RecordUpdater and UpdateReader	Resolved	Owen O'Malley
2.	Transaction manager for Hive	Resolved	Alan Gates
3.	Insert, update, delete functionality needs a compactor	Resolved	Alan Gates
4.	Need new "show" functionality for transactions	Resolved	Alan Gates
5.	Need to write documentation for ACID work	Resolved	Alan Gates
6.	Streaming support in Hive	Resolved	Roshan Naik
7.	Fix vectorized input to work with ACID	Resolved	Owen O'Malley
8.	Disable CombineInputFormat for InputFormats that don't use FileSplit	Resolved	Owen O'Malley
9.	Fix OrcRecordUpdater to use sync instead of flush	Resolved	Owen O'Malley
10.	Fix reading partial ORC files while they are being written	Resolved	Owen O'Malley
11.	Need file sink operators that work with ACID	Closed	Alan Gates
12.	RecordUpdater should extend RecordWriter	Resolved	Alan Gates
13.	Add ROW__ID VirtualColumn	Closed	Eugene Koifman
14.	RecordUpdater should read virtual columns from row	Closed	Alan Gates
15.	Modify parser to support new grammar for Insert,Update,Delete	Closed	Eugene Koifman
16.	OrcRecordUpdater needs to implement getStats	Closed	Alan Gates
17.	Generate plans for insert, update, and delete	Closed	Alan Gates
18.	Update privileges to check for update and delete	Closed	vigneshkumar
19.	Update language manual for insert, update, and delete	Resolved	Alan Gates
20.	Compactions need to update table/partition stats	Resolved	Eugene Koifman

Activity

People

Assignee:: Owen O'Malley

Reporter:: Owen O'Malley

Votes:: 34 Vote for this issue

Watchers:: 140 Start watching this issue

Dates

Created:: 19/Sep/13 04:57

Updated:: 01/Oct/19 22:06

Resolved:: 25/Sep/14 16:18