Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-1680

[Python] Timestamp unit change not done in from_pandas() conversion

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.8.0
    • Component/s: Python
    • Labels:
      None

      Description

      When calling Array.from_pandas with a pandas.Series of timestamps that have 'ns' unit and specifying a type to coerce to with 'us' causes problems. When the series has timestamps with a timezone, the unit is ignored. When the series does not have a timezone, it is applied but causes an OverflowError when printing.

      >>> import pandas as pd
      >>> import pyarrow as pa
      >>> from datetime import datetime
      >>> s = pd.Series([datetime.now()])
      >>> s_nyc = s.dt.tz_localize('tzlocal()').dt.tz_convert('America/New_York')
      >>> arr = pa.Array.from_pandas(s_nyc, type=pa.timestamp('us', tz='America/New_York'))
      >>> arr.type
      TimestampType(timestamp[ns, tz=America/New_York])
      >>> arr = pa.Array.from_pandas(s, type=pa.timestamp('us'))
      >>> arr.type
      TimestampType(timestamp[us])
      >>> print(arr)
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
        File "pyarrow/array.pxi", line 295, in pyarrow.lib.Array.__repr__ (/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:26221)
          values = array_format(self, window=10)
        File "pyarrow/formatting.py", line 28, in array_format
          values.append(value_format(x, 0))
        File "pyarrow/formatting.py", line 49, in value_format
          return repr(x)
        File "pyarrow/scalar.pxi", line 63, in pyarrow.lib.ArrayValue.__repr__ (/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:19535)
          return repr(self.as_py())
        File "pyarrow/scalar.pxi", line 240, in pyarrow.lib.TimestampValue.as_py (/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:21600)
          return converter(value, tzinfo=tzinfo)
        File "pyarrow/scalar.pxi", line 204, in pyarrow.lib.lambda5 (/home/bryan/git/arrow/python/build/temp.linux-x86_64-2.7/lib.cxx:7295)
          TimeUnit_MICRO: lambda x, tzinfo: pd.Timestamp(
        File "pandas/_libs/tslib.pyx", line 402, in pandas._libs.tslib.Timestamp.__new__ (pandas/_libs/tslib.c:10051)
        File "pandas/_libs/tslib.pyx", line 1467, in pandas._libs.tslib.convert_to_tsobject (pandas/_libs/tslib.c:27665)
      OverflowError: Python int too large to convert to C long
      

      A workaround is to manually change values with astype

      >>> arr = pa.Array.from_pandas(s.values.astype('datetime64[us]'))
      >>> arr.type
      TimestampType(timestamp[us])
      >>> print(arr)
      <pyarrow.lib.TimestampArray object at 0x7f6a67e0a3c0>
      [
        Timestamp('2017-10-17 11:04:44.308233')
      ]
      >>> 
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                wesmckinn Wes McKinney
                Reporter:
                bryanc Bryan Cutler
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: