Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-15719

Disable writing Parquet summary files by default

    XMLWordPrintableJSON

    Details

    • Target Version/s:

      Description

      Parquet summary files are not particular useful nowadays since

      1. when schema merging is disabled, we assume schema of all Parquet part-files are identical, thus we can read the footer from any part-files.
      2. when schema merging is enabled, we need to read footers of all files anyway to do the merge.

      On the other hand, writing summary files can be expensive because footers of all part-files must be read and merged. This is particularly costly when appending small dataset to large existing Parquet dataset.

        Attachments

          Activity

            People

            • Assignee:
              lian cheng Cheng Lian
              Reporter:
              lian cheng Cheng Lian
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: