Details
Description
Background
From the documentation, the pattern F should give a week of the month.
Symbol | Meaning | Presentation | Example |
F | week-of-month | number(1) | 3 |
Test Data
Here is my test data, that is a csv file.
date 2020-08-01 2020-08-02 2020-08-03 2020-08-04 2020-08-05 2020-08-06 2020-08-07 2020-08-08 2020-08-09 2020-08-10
Steps to the bug
I have tested in the scala spark 3.0.0 and pyspark 3.0.0:
// Spark df.withColumn("date", to_timestamp('date, "yyyy-MM-dd")) .withColumn("week", date_format('date, "F")).show +-------------------+----+ | date|week| +-------------------+----+ |2020-08-01 00:00:00| 1| |2020-08-02 00:00:00| 2| |2020-08-03 00:00:00| 3| |2020-08-04 00:00:00| 4| |2020-08-05 00:00:00| 5| |2020-08-06 00:00:00| 6| |2020-08-07 00:00:00| 7| |2020-08-08 00:00:00| 1| |2020-08-09 00:00:00| 2| |2020-08-10 00:00:00| 3| +-------------------+----+ # pyspark df.withColumn('date', to_timestamp('date', 'yyyy-MM-dd')) \ .withColumn('week', date_format('date', 'F')) \ .show(10, False) +-------------------+----+ |date |week| +-------------------+----+ |2020-08-01 00:00:00|1 | |2020-08-02 00:00:00|2 | |2020-08-03 00:00:00|3 | |2020-08-04 00:00:00|4 | |2020-08-05 00:00:00|5 | |2020-08-06 00:00:00|6 | |2020-08-07 00:00:00|7 | |2020-08-08 00:00:00|1 | |2020-08-09 00:00:00|2 | |2020-08-10 00:00:00|3 | +-------------------+----+
Expected result
The `week` column is not the week of the month. It is a day of the week as a number.
From my calendar, the first day of August should have 1 for the week-of-month and from 2nd to 8th should have 2 and so on.
+-------------------+----+ |date |week| +-------------------+----+ |2020-08-01 00:00:00|1 | |2020-08-02 00:00:00|2 | |2020-08-03 00:00:00|2 | |2020-08-04 00:00:00|2 | |2020-08-05 00:00:00|2 | |2020-08-06 00:00:00|2 | |2020-08-07 00:00:00|2 | |2020-08-08 00:00:00|2 | |2020-08-09 00:00:00|3 | |2020-08-10 00:00:00|3 | +-------------------+----+