[SPARK-23436] Incorrect Date column Inference in partition discovery - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.2.1
Fix Version/s: 2.3.1, 2.4.0
Component/s: SQL
Labels:
None

Description

If a Partition column appears to partial date/timestamp

example : 2018-01-01-23

where it is only truncated upto an hour then the data types of the partitioning columns are automatically inferred as date however, the values are loaded as null.

Here is an example code to reproduce this behaviour

val data = Seq(("1", "2018-01", "2018-01-01-04", "test")).toDF("id", "date_month", "data_hour", "data")  

data.write.partitionBy("id","date_month","data_hour").parquet("output/test")

val input = spark.read.parquet("output/test")  

input.printSchema()

input.show()


## Result ###

root

|-- data: string (nullable = true)

|-- id: integer (nullable = true)

|-- date_month: string (nullable = true)

|-- data_hour: date (nullable = true)



+----+---+----------+---------+

|data| id|date_month|data_hour|

+----+---+----------+---------+

|test|  1|   2018-01|     null|

+----+---+----------+---------+

Attachments

Issue Links

links to

[Github] Pull Request #20621 (mgaido91)

[Github] Pull Request #20764 (gatorsmile)

Activity

People

Assignee:: Marco Gaido

Reporter:: Apoorva Sareen

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 15/Feb/18 15:16

Updated:: 09/Mar/18 12:42

Resolved:: 20/Feb/18 05:57