[SPARK-19228] inferSchema function processed csv date column as string and "dateFormat" DataSource option is ignored - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: 2.1.0
Fix Version/s: None
Component/s: Input/Output, SQL
Labels:
- bulk-closed

Description

Current FastDateFormat parser can't properly parse date and timestamp and does not meet the ISO8601.
For example, I need to process user.csv like this:

id,project,started,ended
sergey.rubtsov,project0,12/12/2012,10/10/2015

When I add date format options:

        Dataset<Row> users = spark.read().format("csv").option("mode", "PERMISSIVE").option("header", "true")
                                .option("inferSchema", "true").option("dateFormat", "dd/MM/yyyy").load("src/main/resources/user.csv");
		users.printSchema();

expected scheme should be

root
 |-- id: string (nullable = true)
 |-- project: string (nullable = true)
 |-- started: date (nullable = true)
 |-- ended: date (nullable = true)

but the actual result is:

root
 |-- id: string (nullable = true)
 |-- project: string (nullable = true)
 |-- started: string (nullable = true)
 |-- ended: string (nullable = true)

This mean that date processed as string and "dateFormat" option is ignored.
If I add option

.option("timestampFormat", "dd/MM/yyyy")

result is:

root
 |-- id: string (nullable = true)
 |-- project: string (nullable = true)
 |-- started: timestamp (nullable = true)
 |-- ended: timestamp (nullable = true)

Attachments

Issue Links

is duplicated by

SPARK-25517 Spark DataFrame option inferSchema="true", dataFormat=MM/dd/yyyy, fails to detect date type from the csv file while reading

Resolved

is related to

SPARK-26178 Use java.time API for parsing timestamps and dates from CSV

Resolved

links to

[Github] Pull Request #16735 (sergey-rubtsov)

[Github] Pull Request #20140 (sergey-rubtsov)

[Github] Pull Request #21363 (sergey-rubtsov)

GitHub Pull Request #21363

(1 links to)

Activity

People

Assignee:: Unassigned

Reporter:: Sergey Rubtsov

Votes:: 3 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 15/Jan/17 15:20

Updated:: 12/Dec/22 18:11

Resolved:: 25/May/21 01:38

Time Tracking

Estimated:

Remaining:

Logged:

Not Specified