Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-31408 Build Spark’s own datetime pattern definition
  3. SPARK-31879

First day of week changed for non-MONDAY_START Lacales

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 3.0.0, 3.1.0
    • Fix Version/s: 3.0.0
    • Component/s: SQL
    • Labels:
      None

      Description

      cases

      spark-sql> select to_timestamp('2020-1-1', 'YYYY-w-u');
      2019-12-29 00:00:00
      spark-sql> set spark.sql.legacy.timeParserPolicy=legacy;
      spark.sql.legacy.timeParserPolicy	legacy
      spark-sql> select to_timestamp('2020-1-1', 'YYYY-w-u');
      2019-12-30 00:00:00
      

      reasons

      These week-based fields need Locale to express their semantics, the first day of the week varies from country to country.

      From the Java doc of WeekFields

          /**
           * Gets the first day-of-week.
           * <p>
           * The first day-of-week varies by culture.
           * For example, the US uses Sunday, while France and the ISO-8601 standard use Monday.
           * This method returns the first day using the standard {@code DayOfWeek} enum.
           *
           * @return the first day-of-week, not null
           */
          public DayOfWeek getFirstDayOfWeek() {
              return firstDayOfWeek;
          }
      

      But for the SimpleDateFormat, the day-of-week is not localized

      ```
      u Day number of week (1 = Monday, ..., 7 = Sunday) Number 1
      ```

      Currently, the default locale we use is the US, so the result moved a day backward.

      For other countries, please refer to First Day of the Week in Different Countries

      solution options

      1. Use new Locale("en", "GB") as default locale.
      2. For JDK10 and onwards, we can set locale Unicode extension 'fw' to 'mon', but not work for lower JDKs
      3. Forbid 'u', give user proper exceptions, and enable and document 'e/c'. Currently, the 'u' is internally substituted by 'e', but they are not equivalent.

      1 and 2 can solve this with default locale but not for the functions with custom locale supported.

      cc Wenchen Fan Dongjoon Hyun Takeshi Yamamuro

        Attachments

          Activity

            People

            • Assignee:
              Qin Yao Kent Yao
              Reporter:
              Qin Yao Kent Yao
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: