Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32532

Improve ORC read/write performance on nested structs and array of structs

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • 3.0.0
    • None
    • SQL
    • None

    Description

      Have some improvements for ORC file format to reduce time taken when reading/writing nested/array'd structs. Using benchmarks in SPARK-32531 was able to improve performance on branch-3.0 as follows (measurements in seconds):

      Read:
      Nested Structs: 184 -> 44
      Array of Struct: 66 -> 15

      Write
      Nested Structs: 543 -> 39
      Array of Struct: 330 -> 37

      Will be putting up the PR soon with the changes.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              samkhan Muhammad Samir Khan
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: