Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-1669

SQLContext.cacheTable() should be idempotent

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.0.0
    • Fix Version/s: 1.0.1, 1.1.0
    • Component/s: SQL
    • Labels:

      Description

      Calling cacheTable() on some table t multiple times causes table t to be cached multiple times. This semantics is different from RDD.cache(), which is idempotent.

      We can check whether a table is already cached by checking:

      1. whether the structure of the underlying logical plan of the table is matches the pattern Subquery(_, SparkLogicalPlan(inMem @ InMemoryColumnarTableScan(_, _)))
      2. whether inMem.cachedColumnBuffers.getStorageLevel.useMemory is true

        Attachments

          Activity

            People

            • Assignee:
              lian cheng Cheng Lian
              Reporter:
              lian cheng Cheng Lian
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: