Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.5.0
Description
We have identified a cache invalidation issue when caching JDBC tables in Spark SQL. The cached table is unexpectedly invalidated when queried, leading to a re-read from the JDBC table instead of retrieving data from the cache.
Example SQL:
CACHE TABLE cache_t SELECT * FROM mysql.test.test1; SELECT * FROM cache_t;
Expected Behavior:
The expectation is that querying the cached table (cache_t) should retrieve the result from the cache without re-evaluating the execution plan.
Actual Behavior:
However, the cache is invalidated, and the content is re-read from the JDBC table.
Root Cause:
The issue lies in the 'CacheData' class, where the comparison involves 'JDBCTable.' The 'JDBCTable' is a case class:
case class JDBCTable(ident: Identifier, schema: StructType, jdbcOptions: JDBCOptions)
The comparison of non-case class components, such as 'jdbcOptions,' involves pointer comparison. This leads to unnecessary cache invalidation.
Attachments
Issue Links
- links to