[SPARK-22165] Type conflicts between dates, timestamps and date in partition column - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 2.1.1, 2.2.0, 2.3.0
Fix Version/s: 2.3.0
Component/s: SQL
Labels:
- release-notes

Description

It looks we have some bugs when resolving type conflicts in partition column. I found few corner cases as below:

Case 1: timestamp should be inferred but date type is inferred.

val df = Seq((1, "2015-01-01"), (2, "2016-01-01 00:00:00")).toDF("i", "ts")
df.write.format("parquet").partitionBy("ts").save("/tmp/foo")
spark.read.load("/tmp/foo").printSchema()

root
 |-- i: integer (nullable = true)
 |-- ts: date (nullable = true)

Case 2: decimal should be inferred but integer is inferred.

val df = Seq((1, "1"), (2, "1" * 30)).toDF("i", "decimal")
df.write.format("parquet").partitionBy("decimal").save("/tmp/bar")
spark.read.load("/tmp/bar").printSchema()

root
 |-- i: integer (nullable = true)
 |-- decimal: integer (nullable = true)

Looks we should de-duplicate type resolution logic if possible rather than separate numeric precedence-like comparison alone.

Attachments

Issue Links

links to

[Github] Pull Request #19389 (HyukjinKwon)

Activity

People

Assignee:: Hyukjin Kwon

Reporter:: Hyukjin Kwon

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 29/Sep/17 05:27

Updated:: 12/Dec/22 18:10

Resolved:: 21/Nov/17 19:54