Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
2.1.1, 2.2.0, 2.3.0
Description
It looks we have some bugs when resolving type conflicts in partition column. I found few corner cases as below:
Case 1: timestamp should be inferred but date type is inferred.
val df = Seq((1, "2015-01-01"), (2, "2016-01-01 00:00:00")).toDF("i", "ts") df.write.format("parquet").partitionBy("ts").save("/tmp/foo") spark.read.load("/tmp/foo").printSchema()
root |-- i: integer (nullable = true) |-- ts: date (nullable = true)
Case 2: decimal should be inferred but integer is inferred.
val df = Seq((1, "1"), (2, "1" * 30)).toDF("i", "decimal") df.write.format("parquet").partitionBy("decimal").save("/tmp/bar") spark.read.load("/tmp/bar").printSchema()
root |-- i: integer (nullable = true) |-- decimal: integer (nullable = true)
Looks we should de-duplicate type resolution logic if possible rather than separate numeric precedence-like comparison alone.