Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Cannot Reproduce
-
2.0.0
-
None
-
None
Description
Currently CSV data source fails to write and read empty data.
The code below:
val emptyDf = spark.range(10).filter(_ => false) emptyDf.write .format("csv") .save(path.getCanonicalPath) val copyEmptyDf = spark.read .format("csv") .load(path.getCanonicalPath) copyEmptyDf.show()
throws an exception below:
Can not create a Path from an empty string java.lang.IllegalArgumentException: Can not create a Path from an empty string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127) at org.apache.hadoop.fs.Path.<init>(Path.java:135) at org.apache.hadoop.util.StringUtils.stringToPath(StringUtils.java:241) at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:362) at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:987) at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:987) at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:178) at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:178) at scala.Option.map(Option.scala:146)
Note that this is a different case with the data below
val emptyDf = spark.createDataFrame(spark.sparkContext.emptyRDD[Row], schema)
In this case, any writer is not initialised and created. (no calls of WriterContainer.writeRows().
Maybe, it should be able to read/write header for schemas as well as empty data.
For Parquet and JSON, it works but CSV does not.
Attachments
Issue Links
- is duplicated by
-
SPARK-20035 Spark 2.0.2 writes empty file if no record is in the dataset
- Resolved
-
SPARK-15475 Add tests for writing and reading back empty data for Parquet, Json and Text data sources
- Resolved
- is related to
-
SPARK-26208 Empty dataframe does not roundtrip for csv with header
- Resolved
- links to