Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25244

[Python] Setting `spark.sql.session.timeZone` only partially respected

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Incomplete
    • Affects Version/s: 2.3.1
    • Fix Version/s: None
    • Component/s: PySpark
    • Labels:

      Description

      The setting `spark.sql.session.timeZone` is respected by PySpark when converting from and to Pandas, as described here. However, when timestamps are converted directly to Pythons `datetime` objects, its ignored and the systems timezone is used.

      This can be checked by the following code snippet

      import pyspark.sql
      
      spark = (pyspark
               .sql
               .SparkSession
               .builder
               .master('local[1]')
               .config("spark.sql.session.timeZone", "UTC")
               .getOrCreate()
              )
      
      df = spark.createDataFrame([("2018-06-01 01:00:00",)], ["ts"])
      df = df.withColumn("ts", df["ts"].astype("timestamp"))
      
      print(df.toPandas().iloc[0,0])
      print(df.collect()[0][0])
      

      Which for me prints (the exact result depends on the timezone of your system, mine is Europe/Berlin)

      2018-06-01 01:00:00
      2018-06-01 03:00:00
      

      Hence, the method `toPandas` respected the timezone setting (UTC), but the method `collect` ignored it and converted the timestamp to my systems timezone.

      The cause for this behaviour is that the methods `toInternal` and `fromInternal` of PySparks `TimestampType` class don't take into account the setting `spark.sql.session.timeZone` and use the system timezone.

      If the maintainers agree that this should be fixed, I would try to come up with a patch. 

       

       

        Attachments

        Issue Links

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              adaitche Anton Daitche

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment