[SPARK-23612] Specify formats for individual DateType and TimestampType columns in schemas - ASF JIRA

Rank to Top

Rank to Bottom

Attach files

Attach Screenshot

Bulk Copy Attachments

Bulk Move Attachments

Voters

Watch issue

Watchers

Create sub-task

Convert to sub-task

Link

Clone

Labels

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Incomplete
Affects Version/s: 2.3.0
Fix Version/s: None
Component/s: PySpark, SQL
Labels:
- DataType
- bulk-closed
- date
- spree
- sql

Description

https://github.com/apache/spark/blob/407f67249639709c40c46917700ed6dd736daa7d/python/pyspark/sql/types.py#L162-L200

It would be very helpful if it were possible to specify the format for individual columns in a schema when reading csv files, rather than one format:

Bar.python

# Currently can only do something like:

spark.read.option("dateFormat", "yyyyMMdd").csv(...) 

# Would like to be able to do something like:

schema = StructType([

    StructField("date1", DateType(format="MM/dd/yyyy"), True),

    StructField("date2", DateType(format="yyyyMMdd"), True)

]

read.schema(schema).csv(...)

Thanks for any help, input!

Attachments

Activity

Comment

This comment will be Viewable by All Users Viewable by All Users

Cancel

People

Assignee:: Unassigned

Reporter:: Patrick Young

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 06/Mar/18 16:29

Updated:: 08/Oct/19 05:44

Resolved:: 08/Oct/19 05:44

Agile

View on Board

Specify formats for individual DateType and TimestampType columns in schemas

Details

Description

Attachments

Attachments

Activity

People

Dates

Agile

Slack

Issue deployment