[SPARK-15473] CSV fails to write and read back empty dataframe - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Cannot Reproduce
Affects Version/s: 2.0.0
Fix Version/s: None
Component/s: SQL
Labels:
None

Description

Currently CSV data source fails to write and read empty data.

The code below:

val emptyDf = spark.range(10).filter(_ => false)
emptyDf.write
  .format("csv")
  .save(path.getCanonicalPath)

val copyEmptyDf = spark.read
  .format("csv")
  .load(path.getCanonicalPath)

copyEmptyDf.show()

throws an exception below:

Can not create a Path from an empty string
java.lang.IllegalArgumentException: Can not create a Path from an empty string
	at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
	at org.apache.hadoop.fs.Path.<init>(Path.java:135)
	at org.apache.hadoop.util.StringUtils.stringToPath(StringUtils.java:241)
	at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:362)
	at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:987)
	at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:987)
	at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:178)
	at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:178)
	at scala.Option.map(Option.scala:146)

Note that this is a different case with the data below

val emptyDf = spark.createDataFrame(spark.sparkContext.emptyRDD[Row], schema)

In this case, any writer is not initialised and created. (no calls of WriterContainer.writeRows().

Maybe, it should be able to read/write header for schemas as well as empty data.

For Parquet and JSON, it works but CSV does not.

Attachments

Issue Links

is duplicated by

SPARK-20035 Spark 2.0.2 writes empty file if no record is in the dataset

Resolved

SPARK-15475 Add tests for writing and reading back empty data for Parquet, Json and Text data sources

Resolved

is related to

SPARK-26208 Empty dataframe does not roundtrip for csv with header

Resolved

links to

[Github] Pull Request #13252 (HyukjinKwon)

Activity

People

Assignee:: Unassigned

Reporter:: Hyukjin Kwon

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 22/May/16 11:02

Updated:: 12/Dec/22 18:10

Resolved:: 10/Mar/18 13:58