Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-11873

Regression for TPC-DS query 63 when used with decimal datatype and windows function

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Auto Closed
    • 1.5.0
    • None
    • SQL

    Description

      When running the TPC-DS based queries for benchmarking spark found that query 63 (after making it similar to original query) show different behavior compared to other queries eg. q98 which has similar function.
      Here are performance numbers(execution time in seconds):
      1.1 Baseline 1.5 1.5 + Decimal
      q63 27 26 38
      q98 18 26 24

      As you can see q63 is showing regression compared to similar query. I am attaching the both version of queries and affected schemas. When adding the windows function back this is the only query seem to be slower than 1.1 in 1.5.
      I have attached the both version of schema and queries.

      Attachments

        1. double_schema.sql
          18 kB
          Dileep Kumar
        2. decimal_schema.sql
          18 kB
          Dileep Kumar
        3. 98.1.5
          0.8 kB
          Dileep Kumar
        4. 63.1.5
          1 kB
          Dileep Kumar
        5. 98.1.1
          1 kB
          Dileep Kumar
        6. 63.1.1
          2 kB
          Dileep Kumar
        7. 63.decimal_schema
          224 kB
          Dileep Kumar
        8. 63.decimal_schema_windows_function
          225 kB
          Dileep Kumar
        9. 63.double_schema
          209 kB
          Dileep Kumar

        Activity

          People

            Unassigned Unassigned
            dkumar@cloudera.com Dileep Kumar
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: