Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-33935

Fix CBOs cost function

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.4.7, 3.0.1, 3.1.0, 3.2.0
    • Fix Version/s: 3.0.2, 3.1.0, 3.2.0
    • Component/s: SQL
    • Labels:
      None

      Description

      The parameter spark.sql.cbo.joinReorder.card.weight is decumented as:

      spark.sql.cbo.joinReorder.card.weight
      The weight of cardinality (number of rows) for plan cost comparison in join reorder: rows * weight + size * (1 - weight).
      

      But in the implementation the formula is a bit different:

      Current implementation
          def betterThan(other: JoinPlan, conf: SQLConf): Boolean = {
            if (other.planCost.card == 0 || other.planCost.size == 0) {
              false
            } else {
              val relativeRows = BigDecimal(this.planCost.card) / BigDecimal(other.planCost.card)
              val relativeSize = BigDecimal(this.planCost.size) / BigDecimal(other.planCost.size)
              relativeRows * conf.joinReorderCardWeight +
                relativeSize * (1 - conf.joinReorderCardWeight) < 1
            }
          }
      

      This change has an unfortunate consequence:
      given two plans A and B, both A betterThan B and B betterThan A might give the same results. This happes when one has many rows with small sizes and other has few rows with large sizes.

      A example values, that have this fenomen with the default weight value (0.7):
      A.card = 500, B.card = 300
      A.size = 30, B.size = 80
      Both A betterThan B and B betterThan A would have score above 1 and would return false.

      A new implementation is proposed, that matches the documentation:

      Proposed implementation
          def betterThan(other: JoinPlan, conf: SQLConf): Boolean = {
            val oldCost = BigDecimal(this.planCost.card) * conf.joinReorderCardWeight +
              BigDecimal(this.planCost.size) * (1 - conf.joinReorderCardWeight)
            val newCost = BigDecimal(other.planCost.card) * conf.joinReorderCardWeight +
              BigDecimal(other.planCost.size) * (1 - conf.joinReorderCardWeight)
            newCost < oldCost
          }
      

        Attachments

          Activity

            People

            • Assignee:
              tanelk Tanel Kiis
              Reporter:
              tanelk Tanel Kiis

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment