[SPARK-15678] Not use cache on appends and overwrites - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.0.0
Fix Version/s: 2.0.0
Component/s: None
Labels:
None

Description

SparkSQL currently doesn't drop caches if the underlying data is overwritten.

val dir = "/tmp/test"
sqlContext.range(1000).write.mode("overwrite").parquet(dir)
val df = sqlContext.read.parquet(dir).cache()
df.count() // outputs 1000

sqlContext.range(10).write.mode("overwrite").parquet(dir)
sqlContext.read.parquet(dir).count() // outputs 1000 instead of 10 <---- We are still using the cached dataset

Attachments

Issue Links

links to

[Github] Pull Request #13419 (sameeragarwal)

[Github] Pull Request #13566 (sameeragarwal)

Activity

People

Assignee:: Sameer Agarwal

Reporter:: Sameer Agarwal

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 31/May/16 18:58

Updated:: 25/Feb/17 05:57

Resolved:: 11/Jun/16 03:43