Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-15834 Time zone / locale sensitivity umbrella
  3. SPARK-13268

SQL Timestamp stored as GMT but toString returns GMT-08:00

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • 1.6.0
    • None
    • SQL
    • None

    Description

      There is an issue with how timestamps are displayed/converted to Strings in Spark SQL. The documentation states that the timestamp should be created in the GMT time zone, however, if we do so, we see that the output actually contains a -8 hour offset:

      new Timestamp(ZonedDateTime.parse("2015-01-01T00:00:00Z[GMT]").toInstant.toEpochMilli)
      res144: java.sql.Timestamp = 2014-12-31 16:00:00.0
      
      new Timestamp(ZonedDateTime.parse("2015-01-01T00:00:00Z[GMT-08:00]").toInstant.toEpochMilli)
      res145: java.sql.Timestamp = 2015-01-01 00:00:00.0
      

      This result is confusing, unintuitive, and introduces issues when converting from DataFrames containing timestamps to RDDs which are then saved as text. This has the effect of essentially shifting all dates in a dataset by 1 day.

      The suggested fix for this is to update the timestamp toString representation to either a) Include timezone or b) Correctly display in GMT.

      This change may well introduce substantial and insidious bugs so I'm not sure how best to resolve this.

      Attachments

        Activity

          People

            Unassigned Unassigned
            ilganeli Ilya Ganelin
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: