Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-22049

Confusing behavior of from_utc_timestamp and to_utc_timestamp

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.1.0
    • Fix Version/s: 2.3.0
    • Component/s: Documentation, SQL
    • Labels:
      None

      Description

      Hello everyone,

      I am confused about the behavior of the functions from_utc_timestamp and to_utc_timestamp. As an example, take the following code to a spark shell

      import java.sql.Timestamp 
      import org.apache.spark.sql.Row
      import org.apache.spark.sql.types._
      
      // 2017-07-14 02:40 UTC
      val rdd = sc.parallelize(Row(new Timestamp(1500000000000L)) :: Nil)
      val df = spark.createDataFrame(rdd, StructType(StructField("date", TimestampType) :: Nil))
      
      df.select(df("date"), from_utc_timestamp(df("date"), "GMT+01:00") as "from_utc", to_utc_timestamp(df("date"), "GMT+01:00") as "to_utc").show(1, false)
      
      // Date format printing is dependent on the timezone of the machine. 
      // The following is in UTC
      // +---------------------+---------------------+---------------------+             
      // |date                 |from_utc             |to_utc               |
      // +---------------------+---------------------+---------------------+
      // |2017-07-14 02:40:00.0|2017-07-14 03:40:00.0|2017-07-14 01:40:00.0|
      // +---------------------+---------------------+---------------------+
      
      df.select(unix_timestamp(df("date")) as "date", unix_timestamp(from_utc_timestamp(df("date"), "GMT+01:00")) as "from_utc", unix_timestamp(to_utc_timestamp(df("date"),  "GMT+01:00")) as "to_utc").show(1, false)
      // +----------+----------+----------+
      // |date      |from_utc  |to_utc    |
      // +----------+----------+----------+
      // |1500000000|1500003600|1499996400|
      // +----------+----------+----------+
      

      So, if interpret correctly, from_utc_timestamp took 02:40 UTC interpreted it as 03:40 GMT+1 (same timestamp) and transformed it to 03:40 UTC. However the description of from_utc_timestamp says

      Given a timestamp, which corresponds to a certain time of day in UTC, returns another timestamp that corresponds to the same time of day in the given timezone.

      I would have then expected that the function take 02:40 UTC and return 02:40 GMT+1 = 01:40 UTC. In fact, I think the descriptions of from_utc_timestamp and to_utc_timestamp are inverted.

      I am interpreting this right?

      Thanks in advance
      Felipe

        Attachments

          Activity

            People

            • Assignee:
              srowen Sean Owen
              Reporter:
              Bunder Felipe Olmos
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: