Description
Splitting tests originally posted in [PR|https://github.com/apache/spark/pull/29352] for SPARK-32531. The added tests cover cases for maps and arrays of nested structs for different file formats. Eg, https://github.com/apache/spark/pull/29353 and https://github.com/apache/spark/pull/29354 add object reuse when reading ORC and Avro files. However, for dynamic data structures like arrays and maps, we do not know just by looking at the schema what the size of the data structure will be so it has to be allocated when reading the data points. The added tests provide coverage so that objects are not accidentally reused when encountering maps and arrays.
AFAIK this is not covered by existing tests.
Attachments
Issue Links
- is related to
-
SPARK-32532 Improve ORC read/write performance on nested structs and array of structs
- In Progress
-
SPARK-32533 Improve Avro read/write performance on nested structs and array of structs
- In Progress
-
SPARK-32531 Add benchmarks for nested structs and arrays for different file formats
- In Progress
- links to