Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
External tables in Hive are often used in situations where the data is being created and managed by other applications outside of Hive. There are several issues that can occur when data being written to table directories by external apps:
- If an application is writing files to a table/partition at the same time that Hive tries to merge files for the same table/partition (ALTER TABLE CONCATENATE, or hive.merge.tezfiles during insert) data can be lost.
- When new data has been added to the table by external applications, the Hive table statistics are often way out of date with the current state of the data. This can result in wrong results in the case of answering queries using stats, or bad query plans being generated.
Some of these operations should be blocked in Hive. It looks like some already have been (HIVE-17403).
Attachments
Issue Links
- is related to
-
HIVE-27409 Iceberg: table with EXTERNAL type can not use statistics to optimize the query
- Closed