Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Duplicate
-
2.4.1
-
None
-
None
-
Local pyspark 2.41
Description
When writing a DataFrame as csv with the Header option set to true,
the header is not written when the DataFrame is empty.
This creates failures for processes that read the csv back.
Example (please notice the limit(0) in the second example):
```
Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 2.4.1 /_/ Using Python version 2.7.17 (default, Nov 7 2019 10:07:09) SparkSession available as 'spark'. >>> df1 = spark.sql("SELECT 1 as a") >>> df1.limit(1).write.mode("OVERWRITE").option("Header", True).csv("data/test/csv") >>> spark.read.option("Header", True).csv("data/test/csv").show() +---+ | a| +---+ | 1| +---+ >>> >>> df1.limit(0).write.mode("OVERWRITE").option("Header", True).csv("data/test/csv") >>> spark.read.option("Header", True).csv("data/test/csv").show() ++ || ++ ++
Expected behavior:
>>> df1.limit(0).write.mode("OVERWRITE").option("Header", True).csv("data/test/csv") >>> spark.read.option("Header", True).csv("data/test/csv").show() +---+ | a| +---+ +---+
Attachments
Attachments
Issue Links
- duplicates
-
SPARK-26208 Empty dataframe does not roundtrip for csv with header
- Resolved