Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-27403

Fix `updateTableStats` to update table stats always with new stats or None

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0, 2.4.1
    • Fix Version/s: 2.4.2, 3.0.0
    • Component/s: SQL
    • Labels:
      None

      Description

      system shall update the table stats automatically if user set spark.sql.statistics.size.autoUpdate.enabled as true, currently this property is not having any significance even if it is enabled or disabled. This feature is similar to Hives auto-gather feature where statistics are automatically computed by default if this feature is enabled.

      Reference:

      https://cwiki.apache.org/confluence/display/Hive/StatsDev

      Reproducing steps:

      scala> spark.sql("create table table1 (name string,age int) stored as parquet")

      scala> spark.sql("insert into table1 select 'a',29")
      res2: org.apache.spark.sql.DataFrame = []

      scala> spark.sql("desc extended table1").show(false)
      ---------------------------------------------------------------------------------------++------

      col_name data_type comment

      ---------------------------------------------------------------------------------------++------

      name string null
      age int null
           
      1. Detailed Table Information
         
      Database default  
      Table table1  
      Owner Administrator  
      Created Time Sun Apr 07 23:41:56 IST 2019  
      Last Access Thu Jan 01 05:30:00 IST 1970  
      Created By Spark 2.4.1  
      Type MANAGED  
      Provider hive  
      Table Properties [transient_lastDdlTime=1554660716]  
      Location file:/D:/spark-2.4.1-bin-hadoop2.7/bin/spark-warehouse/table1  
      Serde Library org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe  
      InputFormat org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat  
      OutputFormat org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat  
      Storage Properties [serialization.format=1]  
      Partition Provider Catalog  

      ---------------------------------------------------------------------------------------++------

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                S71955 Sujith Chacko
                Reporter:
                S71955 Sujith Chacko
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: