[SPARK-49352] Avoid redundant array transform for identical expression - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 4.0.0, 3.3.4, 3.5.2, 3.4.3
Fix Version/s: 4.0.0, 3.4.4, 3.5.3
Component/s: SQL
Labels:
- pull-request-available

Description

Our customer encounters significant performance regression when migrating from Spark 3.2 to Spark 3.4 on a `Insert Into` query which is analyzed as a `AppendData` on an Iceberg table.

We found that the root cause is in Spark 3.4, `TableOutputResolver` resolves the query with additional `ArrayTransform` on an `ArrayType` field. The `ArrayTransform`'s lambda function is actually an identical function, i.e., the transformation is redundant.

Attachments

Issue Links

links to

GitHub Pull Request #47843

GitHub Pull Request #47862

GitHub Pull Request #47863

Activity

People

Assignee:: L. C. Hsieh

Reporter:: L. C. Hsieh

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 22/Aug/24 06:41

Updated:: 5 days ago 10:12

Resolved:: 23/Aug/24 22:43