Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-34120 Improve the statistics estimation
  3. SPARK-33959

Improve the statistics estimation of the Tail

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.2.0
    • 3.2.0
    • SQL
    • None

    Description

      spark.sql("set spark.sql.cbo.enabled=true")
      spark.range(100).selectExpr("id as a", "id as b", "id as c", "id as e").write.saveAsTable("t1")
      println(Tail(Literal(5), spark.sql("SELECT * FROM t1").queryExecution.logical).queryExecution.explainString(org.apache.spark.sql.execution.CostMode))
      
      

      Current:

      == Optimized Logical Plan ==
      Tail 5, Statistics(sizeInBytes=3.8 KiB)
      +- Relation[a#24L,b#25L,c#26L,e#27L] parquet, Statistics(sizeInBytes=3.8 KiB)
      

      Expected:

      == Optimized Logical Plan ==
      Tail 5, Statistics(sizeInBytes=200.0 B, rowCount=5)
      +- Relation[a#24L,b#25L,c#26L,e#27L] parquet, Statistics(sizeInBytes=3.8 KiB)
      

      Attachments

        Activity

          People

            yumwang Yuming Wang
            yumwang Yuming Wang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: