Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
4.0.0
-
None
Description
ACID tables will be bootstrapped during incremental phase in couple of cases.
1. hive.repl.bootstrap.acid.tables is set to true in WITH clause of REPL DUMP.
2. If replication policy is changed using REPLACE clause in REPL DUMP where the ACID table is matching new policy but not old policy.
REPL DUMP performed below sequence of operations. Let's say Thread (T1)
1. Get Last Repl ID (lastId)
2. Open Transaction (Tx1)
3. Dump events until lastId.
4. Get the list of tables in the given DB.
5. If table matches current policy, then bootstrap dump it.
Let's say, concurrently another thread (let's say T2) is running as follows.
11. Open Transaction (Tx2).
12. Insert into ACID table Tbl1.
13. Commit Transaction (Tx2)
14. Drop table (Tbl1) --> Not necessarily same thread, may be from different thread as well.
Problematic Use-cases:
1. If Step-11 happens between Step-1 and Step-2. Also, Step-13 completes before we forcefully abort Tx2 from REPL DUMP thread T1. Also, assume Step-14 is done after bootstrap is completed. In this case, bootstrap would replicate the data/writeId written by Tx2. But, the next incremental cycle would also replicate the open_txn, allocate_writeid and commit_txn events which would duplicate the data.
2. If Step-11 to Step-14 in Thread T2 happens after Step-1 in REPL DUMP thread T1. In this case, table is not bootstrapped but the corresponding open_txn, allocate_writeid, commit_txn and drop events would be replicated in next cycle. During next cycle, REPL LOAD would fail on commmitTxn event as table is dropped or event is missing.
Attachments
Issue Links
- is related to
-
HIVE-21529 Hive support bootstrap of ACID/MM tables on an existing policy.
- Closed
-
HIVE-21763 Incremental replication to allow changing include/exclude tables list in replication policy.
- Closed
-
HIVE-21880 Enable flaky test TestReplicationScenariosAcidTablesBootstrap.testBootstrapAcidTablesDuringIncrementalWithConcurrentWrites.
- Resolved