Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-26081

Do not write empty files by text datasources

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.4.0
    • Fix Version/s: 3.0.0
    • Component/s: SQL
    • Labels:
    • Docs Text:
      In Spark 3.0, when empty partitions are written to a CSV, JSON or text data source, they no longer produce an empty file, but instead produce no file at all.

      Description

      Text based datasources like CSV, JSON and Text produces empty files for empty partitions. This introduces additional overhead while opening and reading such files back. In current implementation of OutputWriter, the output stream are created eagerly even no records are written to the stream. So, creation can be postponed up to the first write.

        Attachments

          Activity

            People

            • Assignee:
              maxgekk Maxim Gekk
              Reporter:
              maxgekk Maxim Gekk
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: