Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25244

[Python] Setting `spark.sql.session.timeZone` only partially respected

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 2.3.1
    • None
    • PySpark

    Description

      The setting `spark.sql.session.timeZone` is respected by PySpark when converting from and to Pandas, as described here. However, when timestamps are converted directly to Pythons `datetime` objects, its ignored and the systems timezone is used.

      This can be checked by the following code snippet

      import pyspark.sql
      
      spark = (pyspark
               .sql
               .SparkSession
               .builder
               .master('local[1]')
               .config("spark.sql.session.timeZone", "UTC")
               .getOrCreate()
              )
      
      df = spark.createDataFrame([("2018-06-01 01:00:00",)], ["ts"])
      df = df.withColumn("ts", df["ts"].astype("timestamp"))
      
      print(df.toPandas().iloc[0,0])
      print(df.collect()[0][0])
      

      Which for me prints (the exact result depends on the timezone of your system, mine is Europe/Berlin)

      2018-06-01 01:00:00
      2018-06-01 03:00:00
      

      Hence, the method `toPandas` respected the timezone setting (UTC), but the method `collect` ignored it and converted the timestamp to my systems timezone.

      The cause for this behaviour is that the methods `toInternal` and `fromInternal` of PySparks `TimestampType` class don't take into account the setting `spark.sql.session.timeZone` and use the system timezone.

      If the maintainers agree that this should be fixed, I would try to come up with a patch. 

       

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              adaitche Anton Daitche
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: