Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
3.0.0
-
None
Description
How to reproduce:
scala> spark.sql("create table t1(d date)") res2: org.apache.spark.sql.DataFrame = [] scala> spark.sql("insert into table t1 values(cast('2020-08-09' as date))") res3: org.apache.spark.sql.DataFrame = [] scala> spark.sql("select d from t1").show +----------+ | d| +----------+ |1970-01-01| +----------+
Spark 3.0 introduced DaysWritable which extends DateWrite from hive to handle date type. DaysWritable.toString() is called to write its value into hive table. DateWrite.toString() is defined as:
@Override public String toString() { // For toString, the time does not matter return get(false).toString(); } public Date get(boolean doesTimeMatter) { return new Date(daysToMillis(daysSinceEpoch, doesTimeMatter)); }
DaysWritable didn't override toString(), neither get(boolean doesTimeMatter)。It did override get():
override def get(): Date = new Date(DateWritable.daysToMillis(julianDays))
but this didn't help with toString(), so with daysSinceEpoch in DateWritable always as 0, calls to DaysWritable.toString() will always return '1970-01-01', and as a result date value stored into hive table will always have value '1970-01-01'。