Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-33507 Improve and fix cache behavior in v1 and v2
  3. SPARK-34060

ALTER TABLE .. DROP PARTITION uncaches Hive table while updating table stats

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.2.0
    • 3.0.2, 3.1.1, 3.2.0
    • SQL
    • None

    Description

      The example below portraits the issue:

      scala> spark.conf.set("spark.sql.statistics.size.autoUpdate.enabled", true)
      
      scala> sql(s"CREATE TABLE tbl (id int, part int) USING hive PARTITIONED BY (part)")
      21/01/10 13:19:59 WARN HiveMetaStore: Location: file:/Users/maximgekk/proj/apache-spark/spark-warehouse/tbl specified for non-external table:tbl
      res12: org.apache.spark.sql.DataFrame = []
      
      scala> sql("INSERT INTO tbl PARTITION (part=0) SELECT 0")
      res13: org.apache.spark.sql.DataFrame = []
      
      scala> sql("INSERT INTO tbl PARTITION (part=1) SELECT 1")
      res14: org.apache.spark.sql.DataFrame = []
      
      scala> sql("CACHE TABLE tbl")
      res15: org.apache.spark.sql.DataFrame = []
      
      scala> sql("SELECT * FROM tbl").show(false)
      +---+----+
      |id |part|
      +---+----+
      |0  |0   |
      |1  |1   |
      +---+----+
      
      
      scala> spark.catalog.isCached("tbl")
      res17: Boolean = true
      
      scala> sql("ALTER TABLE tbl DROP PARTITION (part=0)")
      res18: org.apache.spark.sql.DataFrame = []
      
      scala> spark.catalog.isCached("tbl")
      res19: Boolean = false
      

      Attachments

        Activity

          People

            maxgekk Max Gekk
            maxgekk Max Gekk
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: