Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30338

Avoid unnecessary InternalRow copies in ParquetRowConverter

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 3.0.0
    • Fix Version/s: 3.0.0
    • Component/s: SQL
    • Labels:
      None

      Description

      ParquetRowConverter calls InternalRow.copy() in cases where the copy is unnecessary; this can severely harm performance when reading deeply-nested Parquet.

      It looks like this copying was originally added to handle arrays and maps of structs (in which case we need to keep the copying), but we can omit it for the more common case of structs nested directly in structs.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                joshrosen Josh Rosen
                Reporter:
                joshrosen Josh Rosen
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: