Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-39971

ANALYZE TABLE makes some queries run forever

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Reopened
    • Major
    • Resolution: Unresolved
    • 3.2.2
    • None
    • Optimizer, SQL
    • None

    Description

      I'm using TPCDS to run benchmarks, and after running ANALYZE TABLE (without the FOR ALL COLUMNS) some queries became really slow. For example query24 - https://raw.githubusercontent.com/Agirish/tpcds/master/query24.sql takes between 10~15min before running the ANALYZE TABLE.

      After running ANALYZE TABLE I waited 24h before cancelling the execution.

      If I disable spark.sql.cbo.joinReorder.enabled or 
      spark.sql.cbo.enabled it becomes fast again.
      It seems something in join reordering is not working well when we have table stats, but not column stats.

      Rows Count:
      store_sales - 2879966589
      store_returns - 288009578
      store - 1002
      item - 300000
      customer - 12000000
      customer_address - 6000000

      Attachments

        Activity

          People

            Unassigned Unassigned
            felipepessoto Felipe
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: