Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
3.5.1, 3.4.3
Description
InternalRowComparableWrapper recreates row ordering for each output partition when SPJ is enabled. The row ordering is generated via codegen which is quite expensive and the output partitions might be quite large for production table such as hundreds of thousands partitions. We encountered this issue when applying SPJ with multiple large Iceberg tables and the plan phase took tens of minutes to complete.
Attaching a screenshot to provide related stack trace:
A simple fix for this would be caching the rowOrdering for InternalRowComparableWrapper as the datatype of the InternalRow is immutable
Attachments
Attachments
Issue Links
- is related to
-
SPARK-37375 Umbrella: Storage Partitioned Join (SPJ)
- Resolved
- links to