Description
The spark v2.2.0 introduce TimeZoneAwareExpression, which causes bugs if we select data from a table with timestamp partitions.
The steps to reproduce it:
spark.sql("create table test (foo string) parititioned by (ts timestamp)") spark.sql("insert into table test partition(ts = 1) values('hi')") spark.table("test").show()
The root cause is that TableReader.scala#230 try to cast the string to timestamp regardless if the timeZone exists.
Here is the error stack trace
java.util.NoSuchElementException: None.get
at scala.None$.get(Option.scala:347)
at scala.None$.get(Option.scala:345)
at org.apache.spark.sql.catalyst.expressions.TimeZoneAwareExpression$class.timeZone(datetimeExpressions.scala:46)
at org.apache.spark.sql.catalyst.expressions.Cast.timeZone$lzycompute(Cast.scala:172) at org.apache.spark.sql.catalyst.expressions.Cast.timeZone(Cast.scala:172)
at org.apache.spark.sql.catalyst.expressions.Cast$$anonfun$castToTimestamp$1$$anonfun$apply$24.apply(Cast.scala:253)
at org.apache.spark.sql.catalyst.expressions.Cast$$anonfun$castToTimestamp$1$$anonfun$apply$24.apply(Cast.scala:253)
at org.apache.spark.sql.catalyst.expressions.Cast.org$apache$spark$sql$catalyst$expressions$Cast$$buildCast(Cast.scala:201)
at org.apache.spark.sql.catalyst.expressions.Cast$$anonfun$castToTimestamp$1.apply(Cast.scala:253)
at org.apache.spark.sql.catalyst.expressions.Cast.nullSafeEval(Cast.scala:533)
at org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:327)
at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$5$$anonfun$fillPartitionKeys$1$1.apply(TableReader.scala:230)
at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$5$$anonfun$fillPartitionKeys$1$1.apply(TableReader.scala:228)