Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30413

Avoid unnecessary WrappedArray roundtrip in GenericArrayData constructor

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 3.0.0
    • SQL
    • None

    Description

      GenericArrayData has a constructor which accepts a seqOrArray: Any parameter. This constructor was originally added for use in situations where we don't know the actual type at compile-time (e.g. when converting UDF outputs). It's also called (perhaps unintentionally) in code paths where we could plausibly and statically know that the type is Array[Any] (in which case we could simply call the primary constructor).

      In the current version of this code there's an unnecessary performance penalty for going through this path when seqOrArray is an Array[Any]: we end up converting the array into a WrappedArray, then call a method to unwrap it back into an array: this results in a bunch of unnecessary method calls. See https://scastie.scala-lang.org/7jOHydbNTaGSU677FWA8nA for an example of situations where this can crop up.

      Via a small modification to this constructor's implementation, I think we can effectively remove this penalty.

      Attachments

        Issue Links

          Activity

            People

              joshrosen Josh Rosen
              joshrosen Josh Rosen
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: