Details
-
Bug
-
Status: Open
-
Blocker
-
Resolution: Unresolved
-
None
Description
Currently, after most DML operations in Spark SQL, Hudi invokes `Catalog.refreshTable`
Prior to Spark 3.2, this was essentially doing the following:
- Invalidating relation cache (forcing next time for relation to be re-resolved, creating new FileIndex, listing files, etc)
- Trigger cascading invalidation (re-caching) of the cached data (in CacheManager)
As of Spark 3.2 it now additionally does `LogicalRelation.refresh` for ALL tables (previously this was only done for Temporary Views), therefore entailing whole table to be re-listed again by triggering `FileIndex.refresh` which might be costly operation.
We should revert back to preceding behavior from Spark 3.1
Attachments
Issue Links
- links to