Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-34120 Improve the statistics estimation
  3. SPARK-33954

Some operator missing rowCount when enable CBO

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.2.0
    • 3.2.0
    • SQL
    • None

    Description

      Some operator missing rowCount when enable CBO, for example:

      spark.range(1000).selectExpr("id as a", "id as b").write.saveAsTable("t1")
      spark.sql("ANALYZE TABLE t1 COMPUTE STATISTICS FOR ALL COLUMNS")
      spark.sql("set spark.sql.cbo.enabled=true")
      spark.sql("set spark.sql.cbo.planStats.enabled=true")
      spark.sql("select * from (select * from t1 distribute by a limit 100) distribute by b").explain("cost")
      

      Current:

      == Optimized Logical Plan ==
      RepartitionByExpression [b#2129L], Statistics(sizeInBytes=2.3 KiB)
      +- GlobalLimit 100, Statistics(sizeInBytes=2.3 KiB, rowCount=100)
         +- LocalLimit 100, Statistics(sizeInBytes=23.4 KiB)
            +- RepartitionByExpression [a#2128L], Statistics(sizeInBytes=23.4 KiB)
               +- Relation[a#2128L,b#2129L] parquet, Statistics(sizeInBytes=23.4 KiB, rowCount=1.00E+3)
      

      Expected:

      == Optimized Logical Plan ==
      RepartitionByExpression [b#2129L], Statistics(sizeInBytes=2.3 KiB, rowCount=100)
      +- GlobalLimit 100, Statistics(sizeInBytes=2.3 KiB, rowCount=100)
         +- LocalLimit 100, Statistics(sizeInBytes=23.4 KiB, rowCount=1.00E+3)
            +- RepartitionByExpression [a#2128L], Statistics(sizeInBytes=23.4 KiB, rowCount=1.00E+3)
               +- Relation[a#2128L,b#2129L] parquet, Statistics(sizeInBytes=23.4 KiB, rowCount=1.00E+3)
      

      Attachments

        Activity

          People

            yumwang Yuming Wang
            yumwang Yuming Wang
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: