Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-33935

Fix CBOs cost function

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.4.7, 3.0.1, 3.1.0, 3.2.0
    • 3.0.2, 3.1.0, 3.2.0
    • SQL
    • None

    Description

      The parameter spark.sql.cbo.joinReorder.card.weight is decumented as:

      spark.sql.cbo.joinReorder.card.weight
      The weight of cardinality (number of rows) for plan cost comparison in join reorder: rows * weight + size * (1 - weight).
      

      But in the implementation the formula is a bit different:

      Current implementation
          def betterThan(other: JoinPlan, conf: SQLConf): Boolean = {
            if (other.planCost.card == 0 || other.planCost.size == 0) {
              false
            } else {
              val relativeRows = BigDecimal(this.planCost.card) / BigDecimal(other.planCost.card)
              val relativeSize = BigDecimal(this.planCost.size) / BigDecimal(other.planCost.size)
              relativeRows * conf.joinReorderCardWeight +
                relativeSize * (1 - conf.joinReorderCardWeight) < 1
            }
          }
      

      This change has an unfortunate consequence:
      given two plans A and B, both A betterThan B and B betterThan A might give the same results. This happes when one has many rows with small sizes and other has few rows with large sizes.

      A example values, that have this fenomen with the default weight value (0.7):
      A.card = 500, B.card = 300
      A.size = 30, B.size = 80
      Both A betterThan B and B betterThan A would have score above 1 and would return false.

      A new implementation is proposed, that matches the documentation:

      Proposed implementation
          def betterThan(other: JoinPlan, conf: SQLConf): Boolean = {
            val oldCost = BigDecimal(this.planCost.card) * conf.joinReorderCardWeight +
              BigDecimal(this.planCost.size) * (1 - conf.joinReorderCardWeight)
            val newCost = BigDecimal(other.planCost.card) * conf.joinReorderCardWeight +
              BigDecimal(other.planCost.size) * (1 - conf.joinReorderCardWeight)
            newCost < oldCost
          }
      

      Attachments

        Activity

          People

            tanelk Tanel Kiis
            tanelk Tanel Kiis
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: