Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-9962

[Python] Conversion to pandas with index column using fixed timezone fails

    XMLWordPrintableJSON

Details

    Description

      From https://github.com/pandas-dev/pandas/issues/35997: it seems we are handling a normal column and index column differently in the conversion to pandas.

      In [5]: import pandas as pd
         ...: from datetime import datetime, timezone
         ...: 
         ...: df = pd.DataFrame([[datetime.now(timezone.utc), datetime.now(timezone.utc)]], columns=['date_index', 'date_column'])
         ...: table = pa.Table.from_pandas(df.set_index('date_index'))
         ...: 
      
      In [6]: table
      Out[6]: 
      pyarrow.Table
      date_column: timestamp[ns, tz=+00:00]
      date_index: timestamp[ns, tz=+00:00]
      
      In [7]: table.to_pandas()
      ...
      UnknownTimeZoneError: '+00:00'
      

      So this happens specifically for "fixed offset" timezones, and only for index columns (eg table.select(["date_column"]).to_pandas() works fine).

      It seems this is because for columns we use our helper make_tz_aware to convert the string "+01:00" to a python timezone, which is then understood by pandas (the string is not handled by pandas). But for the index column we fail to do this.

      Attachments

        Issue Links

          Activity

            People

              jorisvandenbossche Joris Van den Bossche
              jorisvandenbossche Joris Van den Bossche
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3h
                  3h