Description
If a Partition column appears to partial date/timestamp
example : 2018-01-01-23
where it is only truncated upto an hour then the data types of the partitioning columns are automatically inferred as date however, the values are loaded as null.
Here is an example code to reproduce this behaviour
val data = Seq(("1", "2018-01", "2018-01-01-04", "test")).toDF("id", "date_month", "data_hour", "data") data.write.partitionBy("id","date_month","data_hour").parquet("output/test") val input = spark.read.parquet("output/test") input.printSchema() input.show() ## Result ### root |-- data: string (nullable = true) |-- id: integer (nullable = true) |-- date_month: string (nullable = true) |-- data_hour: date (nullable = true) +----+---+----------+---------+ |data| id|date_month|data_hour| +----+---+----------+---------+ |test| 1| 2018-01| null| +----+---+----------+---------+