Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-10511

[Python] Table.to_pandas() failing when timezone-awareness mismatch in metadata

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 2.0.0
    • 3.0.0
    • Python
    • Ubuntu 20.04, Python 3.8.6, Pandas 1.1.4

    Description

      We're having an issue with timezones in the Table to_pandas methods. See example below.

      import pyarrow as pa
      import pandas as pd
      
      print(pa.__version__)
      # 2.0.0
      
      df = pd.DataFrame({"time": pd.to_datetime([0, 0])})
      
      time_field = pa.field("time",type=pa.timestamp("ms", tz="utc"), nullable=False)
      schema = pa.schema([time_field])
      
      tab = pa.Table.from_pandas(df, schema)
      
      tab.to_pandas() 
      
      # File ".../pandas_compat.py", line 777, in table_to_blockmanager
      #   table = _add_any_metadata(table, pandas_metadata)
      # File ".../pandas_compat.py", line 1184, in _add_any_metadata
      #   tz = col_meta['metadata']['timezone']
      # TypeError: 'NoneType' object is not subscriptable
      
      

      Related issues:
      https://github.com/catalyst-cooperative/pudl/issues/705

      Attachments

        Issue Links

          Activity

            People

              jorisvandenbossche Joris Van den Bossche
              karldw Karl Dunkle Werner
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m