Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30413

Avoid unnecessary WrappedArray roundtrip in GenericArrayData constructor

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.0
    • Fix Version/s: 3.0.0
    • Component/s: SQL
    • Labels:
      None

      Description

      GenericArrayData has a constructor which accepts a seqOrArray: Any parameter. This constructor was originally added for use in situations where we don't know the actual type at compile-time (e.g. when converting UDF outputs). It's also called (perhaps unintentionally) in code paths where we could plausibly and statically know that the type is Array[Any] (in which case we could simply call the primary constructor).

      In the current version of this code there's an unnecessary performance penalty for going through this path when seqOrArray is an Array[Any]: we end up converting the array into a WrappedArray, then call a method to unwrap it back into an array: this results in a bunch of unnecessary method calls. See https://scastie.scala-lang.org/7jOHydbNTaGSU677FWA8nA for an example of situations where this can crop up.

      Via a small modification to this constructor's implementation, I think we can effectively remove this penalty.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                joshrosen Josh Rosen
                Reporter:
                joshrosen Josh Rosen
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: