Details
-
New Feature
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
As a user, I wish I could use pyarrow to store a column of datetimes with different timezones. In certain datasets, it is ideal to a column with mixed timezones (ex - taxi pickups). Even if the data is limited to a single location (let's say a business in NYC for example) over the time span of a single year... then your timezones will be EDT/EST with offsets of -4:00 and -5:00.
Currently, it is not possible to keep a column with different timezones.
import pytz import pyarrow as pa import pytz from datetime import datetime arr = pa.array( [datetime(year=2010, month=1, day=1, hour=9, minute=0, second=0, tzinfo=pytz.timezone('US/Eastern')), datetime(year=2010, month=1, day=1, hour=6, minute=0, second=0, tzinfo=pytz.timezone('US/Pacific')) ] )
# value at index 0, 9AM ET # value at index 1, 6AM PT is 9AM ET
assert arr[0].as_py().hour == 9 # fail assert arr[1].as_py().hour == 9 # fail assert arr[0].as_py().hour == 6 # fail assert arr[1].as_py().hour == 6 # fail
> Both datetime values are actually at the same time (although different timezones)