Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-15678

Not use cache on appends and overwrites

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.0.0
    • 2.0.0
    • None
    • None

    Description

      SparkSQL currently doesn't drop caches if the underlying data is overwritten.

      val dir = "/tmp/test"
      sqlContext.range(1000).write.mode("overwrite").parquet(dir)
      val df = sqlContext.read.parquet(dir).cache()
      df.count() // outputs 1000
      
      sqlContext.range(10).write.mode("overwrite").parquet(dir)
      sqlContext.read.parquet(dir).count() // outputs 1000 instead of 10 <---- We are still using the cached dataset
      

      Attachments

        Activity

          People

            sameerag Sameer Agarwal
            sameerag Sameer Agarwal
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: