Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-49408

Poor performance in ProjectingInternalRow

    XMLWordPrintableJSON

Details

    Description

      In ProjectingInternalRow, the colOrdinals is passed as a List. According to the Scala documentation, the apply method for List has a linear time complexity, and it is used in all methods of ProjectingInternalRow for every row. This can have a significant impact on performance.

      The following flame graph was captured in a merge into sql. A considerable amount of time was spent on List.apply. Changing this to IndexedSeq would improve the performance.

       

      20240827-172739.html

      https://docs.scala-lang.org/overviews/collections-2.13/performance-characteristics.html

      Attachments

        1. 20240827-172739.html
          178 kB
          Frank Wong

        Issue Links

          Activity

            People

              wzx Frank Wong
              wzx Frank Wong
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: