Details
Description
Running the following SQL under Hive 0.14.0+ (tested against 0.14.0 and 1.2.1):
CREATE TABLE ts_test STORED AS PARQUET AS SELECT CAST("2015-01-01 00:00:00" AS TIMESTAMP);
Then read the Parquet file generated by Hive with Spark SQL:
scala> sqlContext.read.parquet("hdfs://localhost:9000/user/hive/warehouse_hive14/ts_test").collect() res1: Array[org.apache.spark.sql.Row] = Array([2015-01-01 12:00:00.0])
This issue can be easily reproduced with this test case in PR #8392.
Spark 1.4.1 works as expected in this case.
Update:
Seems that the problem is that we do Julian day conversion wrong in DateTimeUtils. The following spark-shell session illustrates it:
import java.sql._ import java.util._ import org.apache.hadoop.hive.ql.io.parquet.timestamp._ import org.apache.spark.sql.catalyst.util._ TimeZone.setDefault(TimeZone.getTimeZone("GMT")) val ts = Timestamp.valueOf("1970-01-01 00:00:00") val nt = NanoTimeUtils.getNanoTime(ts, false) val jts = DateTimeUtils.fromJulianDay(nt.getJulianDay, nt.getTimeOfDayNanos) DateTimeUtils.toJavaTimestamp(jts) // ==> java.sql.Timestamp = 1970-01-01 12:00:00.0
Attachments
Attachments
Issue Links
- links to