Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-8604

Parquet data source doesn't write summary file while doing appending

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.4.0
    • Fix Version/s: 1.4.1, 1.5.0
    • Component/s: SQL
    • Labels:
      None
    • Target Version/s:

      Description

      Currently, Parquet and ORC data sources don't set their output format class, as we override the output committer in Spark SQL. However, SPARK-8678 ignores user defined output committer class while doing appending to avoid potential issues brought by direct output committers (e.g. DirectParquetOutputCommitter). This makes both of these data sources fallback to the default output committer retrieved from TextOutputFormat, which is FileOutputCommitter. For ORC, it's totally fine since ORC itself just uses FileOutputCommitter. But for Parquet, ParquetOutputCommitter also writes the summary files while committing the job.

        Attachments

          Activity

            People

            • Assignee:
              lian cheng Cheng Lian
              Reporter:
              lian cheng Cheng Lian
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: