Details
-
Improvement
-
Status: In Progress
-
Major
-
Resolution: Unresolved
-
3.0.0
-
None
-
None
Description
Have some improvements for ORC file format to reduce time taken when reading/writing nested/array'd structs. Using benchmarks in SPARK-32531 was able to improve performance on branch-3.0 as follows (measurements in seconds):
Read:
Nested Structs: 184 -> 44
Array of Struct: 66 -> 15
Write
Nested Structs: 543 -> 39
Array of Struct: 330 -> 37
Will be putting up the PR soon with the changes.
Attachments
Issue Links
- relates to
-
SPARK-32550 Make SpecificInternalRow constructors faster by using while loops instead of maps
- Resolved
-
SPARK-32531 Add benchmarks for nested structs and arrays for different file formats
- In Progress
-
SPARK-32731 Add tests for arrays/maps of nested structs to ReadSchemaSuite to test structs reuse
- In Progress
- links to