Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-1958

[Python] Error in pandas conversion for datetimetz row index

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.8.0
    • 0.9.0
    • Python
    • Ubuntu 16.04

    Description

      The pandas conversion of a datetimetz row index in a Table fails with non-UTC time zones because the values are stored as datetime64[ns] and interpreted as datetime64[ns, tz], rather than interpreted as datetime64[ns, UTC] and converted to datetime64[ns, tz]. There's correct handling for time zones for columns in Column.to_pandas, but not for the row index in table_to_blockmanager.

      This is a minimal example demonstrating the failure of a roundtrip between a DataFrame and a Table:

      import pandas as pd
      import pyarrow as pa
      
      df = pd.DataFrame({
          'a': pd.date_range(
              start='2017-01-01', periods=3, tz='America/New_York'
          )
      })
      df = df.set_index('a')
      df_pa = pa.Table.from_pandas(df).to_pandas()
      
      print(df)
      print(df_pa)
      

      The output is:

      Empty DataFrame
      Columns: []
      Index: [2017-01-01 00:00:00-05:00, 2017-01-02 00:00:00-05:00, 2017-01-03 00:00:00-05:00]
      Empty DataFrame
      Columns: []
      Index: [2017-01-01 05:00:00-05:00, 2017-01-02 05:00:00-05:00, 2017-01-03 05:00:00-05:00]
      

      Attachments

        Issue Links

          Activity

            People

              adshieh Albert Shieh
              adshieh Albert Shieh
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: