Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-34120 Improve the statistics estimation
  3. SPARK-34031

Union operator missing rowCount when enable CBO

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.2.0
    • 3.2.0
    • SQL
    • None

    Description

      spark.sql("CREATE TABLE t1 USING parquet AS SELECT id FROM RANGE(10)")
      spark.sql("CREATE TABLE t2 USING parquet AS SELECT id FROM RANGE(10)")
      spark.sql("ANALYZE TABLE t1 COMPUTE STATISTICS FOR ALL COLUMNS")
      spark.sql("ANALYZE TABLE t2 COMPUTE STATISTICS FOR ALL COLUMNS")
      spark.sql("set spark.sql.cbo.enabled=true")
      spark.sql("SELECT * FROM t1 UNION ALL SELECT * FROM t2").explain("cost")
      

      Current:

      == Optimized Logical Plan ==
      Union false, false, Statistics(sizeInBytes=320.0 B)
      :- Relation[id#5880L] parquet, Statistics(sizeInBytes=160.0 B, rowCount=10)
      +- Relation[id#5881L] parquet, Statistics(sizeInBytes=160.0 B, rowCount=10)
      

      Expected

      == Optimized Logical Plan ==
      Union false, false, Statistics(sizeInBytes=320.0 B, rowCount=20)
      :- Relation[id#2138L] parquet, Statistics(sizeInBytes=160.0 B, rowCount=10)
      +- Relation[id#2139L] parquet, Statistics(sizeInBytes=160.0 B, rowCount=10)
      

      Attachments

        Activity

          People

            yumwang Yuming Wang
            yumwang Yuming Wang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: