Description
import datetime import pandas as pd import os dt = [datetime.datetime(2015, 10, 31, 22, 30)] pdf = pd.DataFrame({'time': dt}) os.environ['TZ'] = 'America/New_York' df1 = spark.createDataFrame(pdf) df1.show() +-------------------+ | time| +-------------------+ |2015-10-31 21:30:00| +-------------------+
Seems to related to this line here:
https://github.com/apache/spark/blob/master/python/pyspark/sql/types.py#L1776
It appears to be an issue with "tzlocal()"
Wrong:
from_tz = "America/New_York" to_tz = "tzlocal()" s.apply(lambda ts: ts.tz_localize(from_tz,ambiguous=False).tz_convert(to_tz).tz_localize(None) if ts is not pd.NaT else pd.NaT) 0 2015-10-31 21:30:00 Name: time, dtype: datetime64[ns]
Correct:
from_tz = "America/New_York" to_tz = "America/New_York" s.apply( lambda ts: ts.tz_localize(from_tz, ambiguous=False).tz_convert(to_tz).tz_localize(None) if ts is not pd.NaT else pd.NaT) 0 2015-10-31 22:30:00 Name: time, dtype: datetime64[ns]