Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
3.1.0, 3.0.0, 3.1.1
Description
Existing deployments using hive replication do not get external tables replicated. For such deployments to enable external table replication they will have to provide a specific switch to first bootstrap external tables as part of hive incremental replication, following which the incremental replication will take care of further changes in external tables.
The switch will be provided by an additional hive configuration (for ex: hive.repl.bootstrap.external.tables) and is to be used in
WITH
clause of
REPL DUMP
command.
Additionally the existing hive config hive.repl.include.external.tables will always have to be set to "true" in the above clause.
Proposed usage for enabling external tables replication on existing replication policy.
1. Consider an ongoing repl policy <db1> in incremental phase.
Enable hive.repl.include.external.tables=true and hive.repl.bootstrap.external.tables=true in next incremental REPL DUMP.
- Dumps all events but skips events related to external tables.
- Instead, combine bootstrap dump for all external tables under “_bootstrap” directory.
- Also, includes the data locations file "_external_tables_info”.
- LIMIT or TO clause shouldn’t be there to ensure the latest events are dumped before bootstrap dumping external tables.
2. REPL LOAD on this dump applies all the events first, copies external tables data and then bootstrap external tables (metadata).
- It is possible that the external tables (metadata) are not point-in time consistent with rest of the tables.
- But, it would be eventually consistent when the next incremental load is applied.
- This REPL LOAD is fault tolerant and can be retried if failed.
3. All future REPL DUMPs on this repl policy should set hive.repl.bootstrap.external.tables=false.
- If not set to false, then target might end up having inconsistent set of external tables as bootstrap wouldn’t clean-up any dropped external tables.
Attachments
Attachments
Issue Links
- relates to
-
HIVE-21286 Hive should support clean-up of previously bootstrapped tables when retry from different dump.
- Closed
- links to