[SPARK-31657] CSV Writer writes no header for empty DataFrames - ASF JIRA

Attach files

Attach Screenshot

Voters

Watch issue

Watchers

Create sub-task

Link

Clone

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Duplicate
Affects Version/s: 2.4.1
Fix Version/s: None
Component/s: Input/Output
Labels:
None
Environment:

Local pyspark 2.41

Description

When writing a DataFrame as csv with the Header option set to true,
the header is not written when the DataFrame is empty.

This creates failures for processes that read the csv back.

Example (please notice the limit(0) in the second example):
```

Welcome to
 ____ __
 / __/__ ___ _____/ /__
 _\ \/ _ \/ _ `/ __/ '_/
 /__ / .__/\_,_/_/ /_/\_\ version 2.4.1
 /_/
Using Python version 2.7.17 (default, Nov 7 2019 10:07:09)
SparkSession available as 'spark'.
>>> df1 = spark.sql("SELECT 1 as a")
>>> df1.limit(1).write.mode("OVERWRITE").option("Header", True).csv("data/test/csv")
>>> spark.read.option("Header", True).csv("data/test/csv").show()
+---+
| a|
+---+
| 1|
+---+
>>> 
>>> df1.limit(0).write.mode("OVERWRITE").option("Header", True).csv("data/test/csv")
>>> spark.read.option("Header", True).csv("data/test/csv").show()
++
||
++
++

Expected behavior:

>>> df1.limit(0).write.mode("OVERWRITE").option("Header", True).csv("data/test/csv")
>>> spark.read.option("Header", True).csv("data/test/csv").show()
+---+
| a|
+---+
+---+