Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
2.0.0
-
None
Description
I noticed this when running tests after pulling and building @lw-lin 's PR (https://github.com/apache/spark/pull/14118). I don't think it is anything wrong with his PR, just that the fix that was made to spark-csv for this issue was never moved to spark 2.x when databrick's spark-csv was merged into spark 2 back in January. https://github.com/databricks/spark-csv/issues/308 was fixed in spark-csv after that merge.
The problem is that if I try to write a dataframe that contains a date column out to a csv using something like this
repartitionDf.write.format("csv") //.format(DATABRICKS_CSV)
.option("delimiter", "\t")
.option("header", "false")
.option("nullValue", "?")
.option("dateFormat", "yyyy-MM-dd'T'HH:mm:ss")
.option("escape", "
")
.save(tempFileName)
Then my unit test (which passed under spark 1.6.2) fails using the spark 2.1.0 snapshot build that I made today. The dataframe contained 3 values in a date column.
Expected "[2012-01-03T09:12:00
?
2015-02-23T18:00:]00",
but got
"[1325610720000000
?
14247432000000]00"
This means that while the null value is being correctly exported, the specified dateFormat is not being used to format the date. Instead it looks like number of seconds from epoch is being used.
Attachments
Issue Links
- duplicates
-
SPARK-16216 CSV data source does not write date and timestamp correctly
- Resolved