Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-41276

Optimize constructor use of `StructType`

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 3.4.0
    • 3.4.0
    • MLlib, SQL
    • None

    Description

      There are two main ways to construct `StructType`:

      • Primary constructor

      ```scala
      case class StructType(fields: Array[StructField])
      ```

      • Use `Seq` as input constructor

      ```scala
      def apply(fields: Seq[StructField]): StructType = StructType(fields.toArray)
      ```

      These two construction methods are widely used in Spark, but the latter requires an additional collection conversion.

      This pr changes the following 3 scenarios to use primary constructor to reduce one collection conversion:

      1. For manually create `Seq` input scenes, change to use manually create `Array` input instead, for examaple:

      https://github.com/apache/spark/blob/bcf03fe3f86a7230fd977c059b73a58554370d5d/mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala#L55-L63

      2. For the scenario where 'toSeq' is added to create input for compatibility with Scala 2.13, directly call 'toArray' to instead, for example:

      https://github.com/apache/spark/blob/bcf03fe3f86a7230fd977c059b73a58554370d5d/connector/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala#L108-L113

      3. For scenes whose input is originally `Array`, remove the redundant `toSeq`, for example:

      https://github.com/apache/spark/blob/bcf03fe3f86a7230fd977c059b73a58554370d5d/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala#L587-L592

      Attachments

        Activity

          People

            LuciferYang Yang Jie
            LuciferYang Yang Jie
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: