Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-32123

[Python] Setting `spark.sql.session.timeZone` only partially respected

    XMLWordPrintableJSON

Details

    • Bug
    • Status: In Progress
    • Major
    • Resolution: Unresolved
    • 3.0.0
    • None
    • PySpark
    • None

    Description

      Reopening SPARK-25244 as it is unresolved as of versions 2.4.6 and 3.0.0.

      The setting spark.sql.session.timeZone is respected by PySpark when converting from and to Pandas, as described here. However, when timestamps are converted directly to Pythons datetime objects, its ignored and the systems timezone is used.

      This can be checked by the following code snippet

      import pyspark.sql
      
      spark = (pyspark
               .sql
               .SparkSession
               .builder
               .master('local[1]')
               .config("spark.sql.session.timeZone", "UTC")
               .getOrCreate()
              )
      
      df = spark.createDataFrame([("2018-06-01 01:00:00",)], ["ts"])
      df = df.withColumn("ts", df["ts"].astype("timestamp"))
      
      print(df.toPandas().iloc[0,0])
      print(df.collect()[0][0])
      

      Which for me prints (the exact result depends on the timezone of your system, mine is Europe/Berlin)

      2018-06-01 01:00:00
      2018-06-01 03:00:00
      

      Hence, the method toPandas respected the timezone setting (UTC), but the method collect ignored it and converted the timestamp to my systems timezone.

      The cause for this behaviour is that the methods toInternal and fromInternal of PySparks TimestampType class don't take into account the setting spark.sql.session.timeZone and use the system timezone.

       

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              toby.harradine Toby Harradine
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: