Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-13756

[Python] Error in pandas conversion for datetimetz column index

    XMLWordPrintableJSON

Details

    Description

      The following code fails with:

      File "[...]/lib/python3.8/site-packages/pyarrow/pandas_compat.py", line 1052, in _pandas_type_to_numpy_type
       return np.dtype(pandas_type)
      TypeError: data type 'datetimetz' not understood

      Sample:

      def run():
          filename = "test.parquet"
          df = pd.DataFrame(
              data=range(31),
              columns=list("A"),
              index=pd.date_range("2021-01-01", "2021-01-31", freq="D", tz="CET"),
          ).T
          table = pa.Table.from_pandas(df)
          pq.write_to_dataset(table, root_path=filename)
          result = pq.read_table(filename).to_pandas()
          return result
      
      
      if __name__ == "__main__":
          run()
      

      The code tries to store a dataframe where the columns are timezone aware datetimes.

      Observations:
      If I remove the .T at the end of the dataframe, so that the datatime index are rows it is working (but not what I want).
      If I remove the timezone information tz="CET" the code is working.

      I assume this bug is related to Error in pandas conversion for datetimetz row index

      Attachments

        Issue Links

          Activity

            People

              alenka Alenka Frim
              U53rn4m34lr34dy3x15t5 Andreas Wolf
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h 20m
                  2h 20m