The setting `spark.sql.session.timeZone` is respected by PySpark when converting from and to Pandas, as described here. However, when timestamps are converted directly to Pythons `datetime` objects, its ignored and the systems timezone is used.
This can be checked by the following code snippet
Which for me prints (the exact result depends on the timezone of your system, mine is Europe/Berlin)
Hence, the method `toPandas` respected the timezone setting (UTC), but the method `collect` ignored it and converted the timestamp to my systems timezone.
The cause for this behaviour is that the methods `toInternal` and `fromInternal` of PySparks `TimestampType` class don't take into account the setting `spark.sql.session.timeZone` and use the system timezone.
If the maintainers agree that this should be fixed, I would try to come up with a patch.