Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
When calling Array.from_pandas with a pandas.Series of timestamps that have 'ns' unit and specifying a type to coerce to with 'us' causes problems. When the series has timestamps with a timezone, the unit is ignored. When the series does not have a timezone, it is applied but causes an OverflowError when printing.
>>> import pandas as pd >>> import pyarrow as pa >>> from datetime import datetime >>> s = pd.Series([datetime.now()]) >>> s_nyc = s.dt.tz_localize('tzlocal()').dt.tz_convert('America/New_York') >>> arr = pa.Array.from_pandas(s_nyc, type=pa.timestamp('us', tz='America/New_York')) >>> arr.type TimestampType(timestamp[ns, tz=America/New_York]) >>> arr = pa.Array.from_pandas(s, type=pa.timestamp('us')) >>> arr.type TimestampType(timestamp[us]) >>> print(arr) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "pyarrow/array.pxi", line 295, in pyarrow.lib.Array.__repr__ (/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:26221) values = array_format(self, window=10) File "pyarrow/formatting.py", line 28, in array_format values.append(value_format(x, 0)) File "pyarrow/formatting.py", line 49, in value_format return repr(x) File "pyarrow/scalar.pxi", line 63, in pyarrow.lib.ArrayValue.__repr__ (/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:19535) return repr(self.as_py()) File "pyarrow/scalar.pxi", line 240, in pyarrow.lib.TimestampValue.as_py (/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:21600) return converter(value, tzinfo=tzinfo) File "pyarrow/scalar.pxi", line 204, in pyarrow.lib.lambda5 (/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:7295) TimeUnit_MICRO: lambda x, tzinfo: pd.Timestamp( File "pandas/_libs/tslib.pyx", line 402, in pandas._libs.tslib.Timestamp.__new__ (pandas/_libs/tslib.c:10051) File "pandas/_libs/tslib.pyx", line 1467, in pandas._libs.tslib.convert_to_tsobject (pandas/_libs/tslib.c:27665) OverflowError: Python int too large to convert to C long
A workaround is to manually change values with astype
>>> arr = pa.Array.from_pandas(s.values.astype('datetime64[us]')) >>> arr.type TimestampType(timestamp[us]) >>> print(arr) <pyarrow.lib.TimestampArray object at 0x7f6a67e0a3c0> [ Timestamp('2017-10-17 11:04:44.308233') ] >>>
Attachments
Issue Links
- relates to
-
ARROW-1718 [Python] Implement casts from timestamp to date32/date64 and support in Array.from_pandas
- Resolved