Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-5359

[Python] timestamp_as_object support for pa.Table.to_pandas in pyarrow

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.13.0
    • 1.0.0
    • Python
    • Ubuntu

    Description

      Creating ticket for issue reported in github(https://github.com/apache/arrow/issues/4284)

      pyarrow (Issue with timestamp conversion from arrow to pandas)

      pyarrow Table.to_pandas has option date_as_object but does not have similar option for timestamp. When a timestamp column in arrow table is converted to pandas the target datetype is pd.Timestamp and pd.Timestamp does not handle time > 2262-04-11 23:47:16.854775807 and hence in the below scenario the date is transformed to incorrect value. Adding timestamp_as_object option in pa.Table.to_pandas will help in this scenario.

      #Python(3.6.8)

      import pandas as pd

      import pyarrow as pa

      pd.version
      '0.24.1'

      pa.version
      '0.13.0'

      import datetime

      df = pd.DataFrame({"test_date": [datetime.datetime(3000,12,31,12,0),datetime.datetime(3100,12,31,12,0)]})

      df
      test_date
      0 3000-12-31 12:00:00
      1 3100-12-31 12:00:00

      pa_table = pa.Table.from_pandas(df)

      pa_table[0]
      Column name='test_date' type=TimestampType(timestamp[us])
      [
      [
      32535172800000000,
      35690846400000000
      ]
      ]

      pa_table.to_pandas()
      test_date
      0 1831-11-22 12:50:52.580896768
      1 1931-11-22 12:50:52.580896768

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              joetl Joe Muruganandam
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 5h 50m
                  5h 50m