[SPARK-1669] SQLContext.cacheTable() should be idempotent - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 1.0.0
Fix Version/s: 1.0.1, 1.1.0
Component/s: SQL
Labels:
- cache
- column

Description

Calling cacheTable() on some table t multiple times causes table t to be cached multiple times. This semantics is different from RDD.cache(), which is idempotent.

We can check whether a table is already cached by checking:

whether the structure of the underlying logical plan of the table is matches the pattern Subquery(_, SparkLogicalPlan(inMem @ InMemoryColumnarTableScan(_, _)))
whether inMem.cachedColumnBuffers.getStorageLevel.useMemory is true

Attachments

Activity

People

Assignee:: Cheng Lian

Reporter:: Cheng Lian

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 29/Apr/14 23:12

Updated:: 23/Jun/14 21:55

Resolved:: 23/Jun/14 21:55