Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-26708

Incorrect result caused by inconsistency between a SQL cache's cached RDD and its physical plan

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 2.4.0
    • Fix Version/s: 2.4.1, 3.0.0
    • Component/s: SQL
    • Labels:

      Description

      When performing non-cascading cache invalidation, recache is called on the other cache entries which are dependent on the cache being invalidated. It leads to the the physical plans of those cache entries being re-compiled. For those cache entries, if the cache RDD has already been persisted, chances are there will be inconsistency between the data and the new plan. It can cause a correctness issue if the new plan's outputPartitioning or outputOrdering is different from the that of the actual data, and meanwhile the cache is used by another query that asks for specific outputPartitioning or outputOrdering which happens to match the new plan but not the actual data.

        Attachments

          Activity

            People

            • Assignee:
              maryannxue Maryann Xue
              Reporter:
              smilegator Xiao Li
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: