Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-16547

[Python] to_pandas fails with FixedOffset timezones when timestamp_as_object is used

    XMLWordPrintableJSON

Details

    Description

      The `to_pandas` method fails with "ValueError: fromutc: dt.tzinfo is not self" when timestamp_as_object=True and a timezone with a fixed offset is used. E.g. "+08:00"

      Repro script attached.

       

      The problem seems to be that `fromutc` is called on the tzinfo object here, which is not working when the object is pytz._FixedOffset: https://github.com/apache/arrow/blob/90aac16761b7dbf5fe931bc8837cad5116939270/cpp/src/arrow/python/arrow_to_pandas.cc#L1068

      import pyarrow as pa
      import datetime as dt
      import pytz
      
      tz = pytz.FixedOffset(120)
      ts = tz.localize(dt.datetime(2022, 5, 12, 16, 57))
      
      timestamps = pa.array([ts])
      names = ["timestamp_col"]
      table = pa.Table.from_arrays([timestamps], names=names)
      
      print(table.schema)
      
      # Works fine
      print(table.to_pandas())
      
      # Fails with "ValueError: fromutc: dt.tzinfo is not self"
      table.to_pandas(timestamp_as_object=True)
      

      Attachments

        1. pyarrow_to_pandas_repro.py
          0.4 kB
          Sander Goos

        Issue Links

          Activity

            People

              alenka Alenka Frim
              sgoos-db Sander Goos
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 24h
                  24h
                  Remaining:
                  Time Spent - 3h 20m Remaining Estimate - 20h 40m
                  20h 40m
                  Logged:
                  Time Spent - 3h 20m Remaining Estimate - 20h 40m
                  3h 20m