Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-45449

Cache Invalidation Issue with JDBC Table

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      We have identified a cache invalidation issue when caching JDBC tables in Spark SQL. The cached table is unexpectedly invalidated when queried, leading to a re-read from the JDBC table instead of retrieving data from the cache.
      Example SQL:

      CACHE TABLE cache_t SELECT * FROM mysql.test.test1;
      SELECT * FROM cache_t;
      

      Expected Behavior:
      The expectation is that querying the cached table (cache_t) should retrieve the result from the cache without re-evaluating the execution plan.

      Actual Behavior:
      However, the cache is invalidated, and the content is re-read from the JDBC table.

      Root Cause:
      The issue lies in the 'CacheData' class, where the comparison involves 'JDBCTable.' The 'JDBCTable' is a case class:

      case class JDBCTable(ident: Identifier, schema: StructType, jdbcOptions: JDBCOptions)
      

      The comparison of non-case class components, such as 'jdbcOptions,' involves pointer comparison. This leads to unnecessary cache invalidation.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            liangyongyuan liangyongyuan
            liangyongyuan liangyongyuan
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment