Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.0.0
-
None
-
None
Description
SparkSQL currently doesn't drop caches if the underlying data is overwritten.
val dir = "/tmp/test" sqlContext.range(1000).write.mode("overwrite").parquet(dir) val df = sqlContext.read.parquet(dir).cache() df.count() // outputs 1000 sqlContext.range(10).write.mode("overwrite").parquet(dir) sqlContext.read.parquet(dir).count() // outputs 1000 instead of 10 <---- We are still using the cached dataset