Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-8604

Parquet data source doesn't write summary file while doing appending

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.4.0
    • 1.4.1, 1.5.0
    • SQL
    • None

    Description

      Currently, Parquet and ORC data sources don't set their output format class, as we override the output committer in Spark SQL. However, SPARK-8678 ignores user defined output committer class while doing appending to avoid potential issues brought by direct output committers (e.g. DirectParquetOutputCommitter). This makes both of these data sources fallback to the default output committer retrieved from TextOutputFormat, which is FileOutputCommitter. For ORC, it's totally fine since ORC itself just uses FileOutputCommitter. But for Parquet, ParquetOutputCommitter also writes the summary files while committing the job.

      Attachments

        Activity

          People

            lian cheng Cheng Lian
            lian cheng Cheng Lian
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: