Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Incomplete
-
2.1.0
-
None
Description
Current FastDateFormat parser can't properly parse date and timestamp and does not meet the ISO8601.
For example, I need to process user.csv like this:
id,project,started,ended sergey.rubtsov,project0,12/12/2012,10/10/2015
When I add date format options:
Dataset<Row> users = spark.read().format("csv").option("mode", "PERMISSIVE").option("header", "true") .option("inferSchema", "true").option("dateFormat", "dd/MM/yyyy").load("src/main/resources/user.csv"); users.printSchema();
expected scheme should be
root |-- id: string (nullable = true) |-- project: string (nullable = true) |-- started: date (nullable = true) |-- ended: date (nullable = true)
but the actual result is:
root |-- id: string (nullable = true) |-- project: string (nullable = true) |-- started: string (nullable = true) |-- ended: string (nullable = true)
This mean that date processed as string and "dateFormat" option is ignored.
If I add option
.option("timestampFormat", "dd/MM/yyyy")
result is:
root |-- id: string (nullable = true) |-- project: string (nullable = true) |-- started: timestamp (nullable = true) |-- ended: timestamp (nullable = true)
Attachments
Issue Links
- is duplicated by
-
SPARK-25517 Spark DataFrame option inferSchema="true", dataFormat=MM/dd/yyyy, fails to detect date type from the csv file while reading
- Resolved
- is related to
-
SPARK-26178 Use java.time API for parsing timestamps and dates from CSV
- Resolved
- links to