Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-31408 Build Spark’s own datetime pattern definition
  3. SPARK-31030

Backward Compatibility for Parsing and Formatting Datetime

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 3.0.0
    • SQL
    • None

    Description

      Background

      In Spark version 2.4 and earlier, datetime parsing, formatting and conversion are performed by using the hybrid calendar (Julian + Gregorian). 

      Since the Proleptic Gregorian calendar is de-facto calendar worldwide, as well as the chosen one in ANSI SQL standard, Spark 3.0 switches to it by using Java 8 API classes (the java.time packages that are based on ISO chronology ).

      The switching job is completed in SPARK-26651

       

      Problem

      Switching to Java 8 datetime API breaks the backward compatibility of Spark 2.4 and earlier when parsing datetime. Spark need its own patters definition on datetime parsing and formatting.

       

      Solution

      To avoid unexpected result changes after the underlying datetime API switch, we propose the following solution. 

      • Introduce the fallback mechanism: when the Java 8-based parser fails, we need to detect these behavior differences by falling back to the legacy parser, and fail with a user-friendly error message to tell users what gets changed and how to fix the pattern.
      • Document the Spark’s datetime patterns: The date-time formatter of Spark is decoupled with the Java patterns. The Spark’s patterns are mainly based on the Java 7’s pattern (for better backward compatibility) with the customized logic (caused by the breaking changes between Java 7 and Java 8 pattern string). Below are the customized rules:
      Pattern Java 7 Java 8  Example Rule
      u Day number of week (1 = Monday, ..., 7 = Sunday) Year (Different with y, u accept a negative value to represent BC, while y should be used together with G to do the same thing.)   Substitute ‘u’ to ‘e’ and use Java 8 parser to parse the string. If parsable, return the result; otherwise, fall back to ‘u’, and then use the legacy Java 7 parser to parse. When it is successfully parsed, throw an exception and ask users to change the pattern strings or turn on the legacy mode; otherwise, return NULL as what Spark 2.4 does.
       z  General time zone which also accepts
      RFC 822 time zones]
      Only accept time-zone name, e.g. Pacific Standard Time; PST   The semantics of ‘z’ are different between Java 7 and Java 8. Here, Spark 3.0 follows the semantics of Java 8. 
      Use Java 8 to parse the string. If parsable, return the result; otherwise, use the legacy Java 7 parser to parse. When it is successfully parsed, throw an exception and ask users to change the pattern strings or turn on the legacy mode; otherwise, return NULL as what Spark 2.4 does.

       

       

       

      Attachments

        1. image-2020-03-04-10-54-05-208.png
          22 kB
          Yuanjian Li
        2. image-2020-03-04-10-54-13-238.png
          8 kB
          Yuanjian Li

        Activity

          People

            XuanYuan Yuanjian Li
            XuanYuan Yuanjian Li
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: