Description
Background
In Spark version 2.4 and earlier, datetime parsing, formatting and conversion are performed by using the hybrid calendar (Julian + Gregorian).
Since the Proleptic Gregorian calendar is de-facto calendar worldwide, as well as the chosen one in ANSI SQL standard, Spark 3.0 switches to it by using Java 8 API classes (the java.time packages that are based on ISO chronology ).
The switching job is completed in SPARK-26651.
Problem
Switching to Java 8 datetime API breaks the backward compatibility of Spark 2.4 and earlier when parsing datetime. Spark need its own patters definition on datetime parsing and formatting.
Solution
To avoid unexpected result changes after the underlying datetime API switch, we propose the following solution.
- Introduce the fallback mechanism: when the Java 8-based parser fails, we need to detect these behavior differences by falling back to the legacy parser, and fail with a user-friendly error message to tell users what gets changed and how to fix the pattern.
- Document the Spark’s datetime patterns: The date-time formatter of Spark is decoupled with the Java patterns. The Spark’s patterns are mainly based on the Java 7’s pattern (for better backward compatibility) with the customized logic (caused by the breaking changes between Java 7 and Java 8 pattern string). Below are the customized rules:
Pattern | Java 7 | Java 8 | Example | Rule |
---|---|---|---|---|
u | Day number of week (1 = Monday, ..., 7 = Sunday) | Year (Different with y, u accept a negative value to represent BC, while y should be used together with G to do the same thing.) | Substitute ‘u’ to ‘e’ and use Java 8 parser to parse the string. If parsable, return the result; otherwise, fall back to ‘u’, and then use the legacy Java 7 parser to parse. When it is successfully parsed, throw an exception and ask users to change the pattern strings or turn on the legacy mode; otherwise, return NULL as what Spark 2.4 does. | |
z | General time zone which also accepts RFC 822 time zones] |
Only accept time-zone name, e.g. Pacific Standard Time; PST | The semantics of ‘z’ are different between Java 7 and Java 8. Here, Spark 3.0 follows the semantics of Java 8. Use Java 8 to parse the string. If parsable, return the result; otherwise, use the legacy Java 7 parser to parse. When it is successfully parsed, throw an exception and ask users to change the pattern strings or turn on the legacy mode; otherwise, return NULL as what Spark 2.4 does. |