Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-1958

[Python] Error in pandas conversion for datetimetz row index

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.8.0
    • Fix Version/s: 0.9.0
    • Component/s: Python
    • Environment:
      Ubuntu 16.04

      Description

      The pandas conversion of a datetimetz row index in a Table fails with non-UTC time zones because the values are stored as datetime64[ns] and interpreted as datetime64[ns, tz], rather than interpreted as datetime64[ns, UTC] and converted to datetime64[ns, tz]. There's correct handling for time zones for columns in Column.to_pandas, but not for the row index in table_to_blockmanager.

      This is a minimal example demonstrating the failure of a roundtrip between a DataFrame and a Table:

      import pandas as pd
      import pyarrow as pa
      
      df = pd.DataFrame({
          'a': pd.date_range(
              start='2017-01-01', periods=3, tz='America/New_York'
          )
      })
      df = df.set_index('a')
      df_pa = pa.Table.from_pandas(df).to_pandas()
      
      print(df)
      print(df_pa)
      

      The output is:

      Empty DataFrame
      Columns: []
      Index: [2017-01-01 00:00:00-05:00, 2017-01-02 00:00:00-05:00, 2017-01-03 00:00:00-05:00]
      Empty DataFrame
      Columns: []
      Index: [2017-01-01 05:00:00-05:00, 2017-01-02 05:00:00-05:00, 2017-01-03 05:00:00-05:00]
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                adshieh Albert Shieh
                Reporter:
                adshieh Albert Shieh
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: