When converting Pandas DataFrame/Series from/to Spark DataFrame using toPandas() or pandas udfs, timestamp values behave to respect Python system timezone instead of session timezone.
For example, let's say we use "America/Los_Angeles" as session timezone and have a timestamp value "1970-01-01 00:00:01" in the timezone. Btw, I'm in Japan so Python timezone would be "Asia/Tokyo".
The timestamp value from current toPandas() will be the following:
As you can see, the value becomes "1970-01-01 17:00:01" because it respects Python timezone.
As we discussed in https://github.com/apache/spark/pull/18664, we consider this behavior is a bug and the value should be "1970-01-01 00:00:01".