Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-21739

timestamp partition would fail in v2.2.0

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 2.2.0
    • 2.2.1, 2.3.0
    • SQL
    • None

    Description

      The spark v2.2.0 introduce TimeZoneAwareExpression, which causes bugs if we select data from a table with timestamp partitions.
      The steps to reproduce it:

      spark.sql("create table test (foo string) parititioned by (ts timestamp)")
      spark.sql("insert into table test partition(ts = 1) values('hi')")
      spark.table("test").show()
      

      The root cause is that TableReader.scala#230 try to cast the string to timestamp regardless if the timeZone exists.

      Here is the error stack trace

      java.util.NoSuchElementException: None.get
        at scala.None$.get(Option.scala:347)
        at scala.None$.get(Option.scala:345)
        at org.apache.spark.sql.catalyst.expressions.TimeZoneAwareExpression$class.timeZone(datetimeExpressions.scala:46)
        at org.apache.spark.sql.catalyst.expressions.Cast.timeZone$lzycompute(Cast.scala:172)                                                                                         at org.apache.spark.sql.catalyst.expressions.Cast.timeZone(Cast.scala:172)
        at org.apache.spark.sql.catalyst.expressions.Cast$$anonfun$castToTimestamp$1$$anonfun$apply$24.apply(Cast.scala:253)
        at org.apache.spark.sql.catalyst.expressions.Cast$$anonfun$castToTimestamp$1$$anonfun$apply$24.apply(Cast.scala:253)
        at org.apache.spark.sql.catalyst.expressions.Cast.org$apache$spark$sql$catalyst$expressions$Cast$$buildCast(Cast.scala:201)
        at org.apache.spark.sql.catalyst.expressions.Cast$$anonfun$castToTimestamp$1.apply(Cast.scala:253)
        at org.apache.spark.sql.catalyst.expressions.Cast.nullSafeEval(Cast.scala:533)
        at org.apache.spark.sql.catalyst.expressions.UnaryExpression.eval(Expression.scala:327)
        at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$5$$anonfun$fillPartitionKeys$1$1.apply(TableReader.scala:230)
        at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$5$$anonfun$fillPartitionKeys$1$1.apply(TableReader.scala:228)
      

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            donnyzone Feng Zhu
            zhihao Zhihao Wang
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment