Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-49352

Avoid redundant array transform for identical expression

    XMLWordPrintableJSON

Details

    Description

      Our customer encounters significant performance regression when migrating from Spark 3.2 to Spark 3.4 on a `Insert Into` query which is analyzed as a `AppendData` on an Iceberg table.

      We found that the root cause is in Spark 3.4, `TableOutputResolver` resolves the query with additional `ArrayTransform` on an `ArrayType` field. The `ArrayTransform`'s lambda function is actually an identical function, i.e., the transformation is redundant.

      Attachments

        Activity

          People

            viirya L. C. Hsieh
            viirya L. C. Hsieh
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: