Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-26155

Spark SQL performance degradation after apply SPARK-21052 with Q19 of TPC-DS in 3TB scale

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 2.3.0, 2.3.1, 2.3.2, 2.4.0
    • None
    • SQL
    • None

    Description

      In our test environment, we found a serious performance degradation issue in Spark2.3 when running TPC-DS on SKX 8180. Several queries have serious performance degradation. For example, TPC-DS Q19 needs 126 seconds with Spark 2.3 while it needs only 29 seconds with Spark2.1 on 3TB data. We investigated this problem and figured out the root cause is in community patch SPARK-21052 which add metrics to hash join process. And the impact code is L486 and L487  . Q19 costs about 30 seconds without these two lines code and 126 seconds with these code.

      Attachments

        1. q19.sql
          0.6 kB
          Ke Jia
        2. Q19 analysis in Spark2.3 with L486&487.pdf
          567 kB
          Ke Jia
        3. Q19 analysis in Spark2.3 without L486&487.pdf
          573 kB
          Ke Jia
        4. tpcds.result.xlsx
          40 kB
          Ke Jia

        Issue Links

          Activity

            People

              Unassigned Unassigned
              Jk_Self Ke Jia
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: