Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-16026 Cost-based Optimizer Framework
  3. SPARK-22626

Wrong Hive table statistics may trigger OOM if enables CBO

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.3.0
    • 2.3.0
    • SQL
    • None

    Description

      How to reproduce:

      bin/spark-shell --conf spark.sql.cbo.enabled=true
      
      import org.apache.spark.sql.execution.joins.BroadcastHashJoinExec
      
      spark.sql("CREATE TABLE small (c1 bigint) TBLPROPERTIES ('numRows'='3', 'rawDataSize'='600','totalSize'='800')")
      // Big table with wrong statistics, numRows=0
      spark.sql("CREATE TABLE big (c1 bigint) TBLPROPERTIES ('numRows'='0', 'rawDataSize'='60000000000', 'totalSize'='8000000000000')")
      
      val plan = spark.sql("select * from small t1 join big t2 on (t1.c1 = t2.c1)").queryExecution.executedPlan
      val buildSide = plan.children.head.asInstanceOf[BroadcastHashJoinExec].buildSide
      
      println(buildSide)
      

      The result is BuildRight, but the right side is the big table.

      Attachments

        Activity

          People

            yumwang Yuming Wang
            yumwang Yuming Wang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: