Details
Description
The type detection from datetime objects to array appears to ignore the presence of a tzinfo on the datetime object, instead storing them as naive timestamp columns.
Python code:
import datetime import pytz import pyarrow as pa naive_datetime = datetime.datetime(2019, 1, 13, 12, 11, 10) utc_datetime = datetime.datetime(2019, 1, 13, 12, 11, 10, tzinfo=pytz.utc) tzaware_datetime = utc_datetime.astimezone(pytz.timezone('America/Los_Angeles')) def inspect(varname): print(varname) arr = globals()[varname] print(arr.type) print(arr) print() auto_naive_arr = pa.array([naive_datetime]) inspect("auto_naive_arr") auto_utc_arr = pa.array([utc_datetime]) inspect("auto_utc_arr") auto_tzaware_arr = pa.array([tzaware_datetime]) inspect("auto_tzaware_arr") auto_mixed_arr = pa.array([utc_datetime, tzaware_datetime]) inspect("auto_mixed_arr") naive_type = pa.timestamp("us", naive_datetime.tzname()) utc_type = pa.timestamp("us", utc_datetime.tzname()) tzaware_type = pa.timestamp("us", tzaware_datetime.tzname()) naive_arr = pa.array([naive_datetime], type=naive_type) inspect("naive_arr") utc_arr = pa.array([utc_datetime], type=utc_type) inspect("utc_arr") tzaware_arr = pa.array([tzaware_datetime], type=tzaware_type) inspect("tzaware_arr") mixed_arr = pa.array([utc_datetime, tzaware_datetime], type=utc_type) inspect("mixed_arr")
This prints:
$ python detect_timezone.py auto_naive_arr timestamp[us] [ 1547381470000000 ] auto_utc_arr timestamp[us] [ 1547381470000000 ] auto_tzaware_arr timestamp[us] [ 1547352670000000 ] auto_mixed_arr timestamp[us] [ 1547381470000000, 1547352670000000 ] naive_arr timestamp[us] [ 1547381470000000 ] utc_arr timestamp[us, tz=UTC] [ 1547381470000000 ] tzaware_arr timestamp[us, tz=PST] [ 1547352670000000 ] mixed_arr timestamp[us, tz=UTC] [ 1547381470000000, 1547352670000000 ]
But I would expect the following types instead:
- naive_datetime: timestamp[us]
- auto_utc_arr: timestamp[us, tz=UTC]
- auto_tzaware_arr: timestamp[us, tz=PST] (Or maybe tz='America/Los_Angeles'. I'm not sure why pytz returns PST as the tzname)
- auto_mixed_arr: timestamp[us, tz=UTC]
Also, in the "mixed" case, I'd expect the actual stored microseconds to be the same for both rows, since utc_datetime and tzaware_datetime both refer to the same point in time. It seems reasonable for any naive datetime objects mixed in with tz-aware datetimes to be interpreted as UTC.
Attachments
Issue Links
- is fixed by
-
ARROW-9528 [Python] Honor tzinfo information when converting from datetime to pyarrow
- Resolved
- relates to
-
ARROW-9528 [Python] Honor tzinfo information when converting from datetime to pyarrow
- Resolved