Description
cases
spark-sql> select to_timestamp('2020-1-1', 'YYYY-w-u'); 2019-12-29 00:00:00 spark-sql> set spark.sql.legacy.timeParserPolicy=legacy; spark.sql.legacy.timeParserPolicy legacy spark-sql> select to_timestamp('2020-1-1', 'YYYY-w-u'); 2019-12-30 00:00:00
reasons
These week-based fields need Locale to express their semantics, the first day of the week varies from country to country.
From the Java doc of WeekFields
/** * Gets the first day-of-week. * <p> * The first day-of-week varies by culture. * For example, the US uses Sunday, while France and the ISO-8601 standard use Monday. * This method returns the first day using the standard {@code DayOfWeek} enum. * * @return the first day-of-week, not null */ public DayOfWeek getFirstDayOfWeek() { return firstDayOfWeek; }
But for the SimpleDateFormat, the day-of-week is not localized
```
u Day number of week (1 = Monday, ..., 7 = Sunday) Number 1
```
Currently, the default locale we use is the US, so the result moved a day backward.
For other countries, please refer to First Day of the Week in Different Countries
solution options
1. Use new Locale("en", "GB") as default locale.
2. For JDK10 and onwards, we can set locale Unicode extension 'fw' to 'mon', but not work for lower JDKs
3. Forbid 'u', give user proper exceptions, and enable and document 'e/c'. Currently, the 'u' is internally substituted by 'e', but they are not equivalent.
1 and 2 can solve this with default locale but not for the functions with custom locale supported.