[SPARK-45449] Cache Invalidation Issue with JDBC Table - ASF JIRA

Attach files

Attach Screenshot

Voters

Watch issue

Watchers

Create sub-task

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.5.0
Fix Version/s: 4.0.0, 3.5.1
Component/s: SQL
Labels:
- pull-request-available

Description

We have identified a cache invalidation issue when caching JDBC tables in Spark SQL. The cached table is unexpectedly invalidated when queried, leading to a re-read from the JDBC table instead of retrieving data from the cache.
Example SQL:

CACHE TABLE cache_t SELECT * FROM mysql.test.test1;
SELECT * FROM cache_t;

Expected Behavior:
The expectation is that querying the cached table (cache_t) should retrieve the result from the cache without re-evaluating the execution plan.

Actual Behavior:
However, the cache is invalidated, and the content is re-read from the JDBC table.

Root Cause:
The issue lies in the 'CacheData' class, where the comparison involves 'JDBCTable.' The 'JDBCTable' is a case class:

case class JDBCTable(ident: Identifier, schema: StructType, jdbcOptions: JDBCOptions)

The comparison of non-case class components, such as 'jdbcOptions,' involves pointer comparison. This leads to unnecessary cache invalidation.

Attachments

Issue Links

Add Link

links to

GitHub Pull Request #43258

Delete this link

Activity

Comment

This comment will be Viewable by All Users Viewable by All Users

Cancel

People

Assignee:: liangyongyuan

Reporter:: liangyongyuan

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 07/Oct/23 03:36

Updated:: 10/Oct/23 06:43

Resolved:: 10/Oct/23 06:42

Agile

View on Board

Cache Invalidation Issue with JDBC Table

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Agile

Slack

Issue deployment