Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-26155

Spark SQL performance degradation after apply SPARK-21052 with Q19 of TPC-DS in 3TB scale

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 2.3.0, 2.3.1, 2.3.2, 2.4.0
    • Fix Version/s: None
    • Component/s: SQL
    • Labels:
      None

      Description

      In our test environment, we found a serious performance degradation issue in Spark2.3 when running TPC-DS on SKX 8180. Several queries have serious performance degradation. For example, TPC-DS Q19 needs 126 seconds with Spark 2.3 while it needs only 29 seconds with Spark2.1 on 3TB data. We investigated this problem and figured out the root cause is in community patch SPARK-21052 which add metrics to hash join process. And the impact code is L486 and L487  . Q19 costs about 30 seconds without these two lines code and 126 seconds with these code.

        Attachments

        1. tpcds.result.xlsx
          40 kB
          Ke Jia
        2. Q19 analysis in Spark2.3 without L486&487.pdf
          573 kB
          Ke Jia
        3. Q19 analysis in Spark2.3 with L486&487.pdf
          567 kB
          Ke Jia
        4. q19.sql
          0.6 kB
          Ke Jia

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                Jk_Self Ke Jia
              • Votes:
                0 Vote for this issue
                Watchers:
                11 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: