Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32683

Datetime Pattern F not working as expected

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 3.0.1, 3.1.0
    • SQL
    • None
    • Windows 10 Pro

      • with Jupyter Lab - Docker Image 
        • jupyter/all-spark-notebook:f1811928b3dd 
          • spark 3.0.0
          • python 3.8.5
          • openjdk 11.0.8

    Description

      Background

      From the documentation, the pattern F should give a week of the month.

      Symbol Meaning Presentation Example
      F week-of-month number(1) 3

      Test Data

      Here is my test data, that is a csv file.

      date
      2020-08-01
      2020-08-02
      2020-08-03
      2020-08-04
      2020-08-05
      2020-08-06
      2020-08-07
      2020-08-08
      2020-08-09
      2020-08-10 

      Steps to the bug

      I have tested in the scala spark 3.0.0 and pyspark 3.0.0:

      // Spark
      
      df.withColumn("date", to_timestamp('date, "yyyy-MM-dd"))
        .withColumn("week", date_format('date, "F")).show
      
      +-------------------+----+
      |               date|week|
      +-------------------+----+
      |2020-08-01 00:00:00|   1|
      |2020-08-02 00:00:00|   2|
      |2020-08-03 00:00:00|   3|
      |2020-08-04 00:00:00|   4|
      |2020-08-05 00:00:00|   5|
      |2020-08-06 00:00:00|   6|
      |2020-08-07 00:00:00|   7|
      |2020-08-08 00:00:00|   1|
      |2020-08-09 00:00:00|   2|
      |2020-08-10 00:00:00|   3|
      +-------------------+----+
      
      
      # pyspark
      
      df.withColumn('date', to_timestamp('date', 'yyyy-MM-dd')) \
        .withColumn('week', date_format('date', 'F')) \
        .show(10, False)
      
      +-------------------+----+
      |date               |week|
      +-------------------+----+
      |2020-08-01 00:00:00|1   |
      |2020-08-02 00:00:00|2   |
      |2020-08-03 00:00:00|3   |
      |2020-08-04 00:00:00|4   |
      |2020-08-05 00:00:00|5   |
      |2020-08-06 00:00:00|6   |
      |2020-08-07 00:00:00|7   |
      |2020-08-08 00:00:00|1   |
      |2020-08-09 00:00:00|2   |
      |2020-08-10 00:00:00|3   |
      +-------------------+----+

      Expected result

      The `week` column is not the week of the month. It is a day of the week as a number.

       

      From my calendar, the first day of August should have 1 for the week-of-month and from 2nd to 8th should have 2 and so on.

      +-------------------+----+
      |date               |week|
      +-------------------+----+
      |2020-08-01 00:00:00|1   |
      |2020-08-02 00:00:00|2   |
      |2020-08-03 00:00:00|2   |
      |2020-08-04 00:00:00|2   |
      |2020-08-05 00:00:00|2   |
      |2020-08-06 00:00:00|2   |
      |2020-08-07 00:00:00|2   |
      |2020-08-08 00:00:00|2   |
      |2020-08-09 00:00:00|3   |
      |2020-08-10 00:00:00|3   |
      +-------------------+----+

      Attachments

        1. comment.png
          93 kB
          Daeho Ro

        Activity

          People

            Qin Yao Kent Yao 2
            lamanus Daeho Ro
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: