Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-15253

[Python] Error in to_pandas for empty dataframe with pd.interval_range index

    XMLWordPrintableJSON

Details

    Description

      In _table_to_blocks (pandas_compat.py) the input extension_columns is equal to

      {None: interval[int64, right]}

      for pd.interval_range and so an error is triggered as None can not be encoded. Same happens for pd.PeriodIndex.

      Example:

      import pandas as pd
      import pyarrow as pa
      
      df = pd.DataFrame(index=pd.interval_range(start=0, end=5))
      table = pa.table(df)
      table.to_pandas()
      

      Error:

      TypeError                                 Traceback (most recent call last)
      /var/folders/gw/q7wqd4tx18n_9t4kbkd0bj1m0000gn/T/ipykernel_13963/1439451337.py in <module>
            1 df5 = pd.DataFrame(index=pd.PeriodIndex(year=[2000, 2002], quarter=[1, 3]))
            2 table5 = pa.table(df5)
      ----> 3 table5.to_pandas().shape
      
      ~/repos/arrow/python/pyarrow/array.pxi in pyarrow.lib._PandasConvertible.to_pandas()
          764             self_destruct=self_destruct
          765         )
      --> 766         return self._to_pandas(options, categories=categories,
          767                                ignore_metadata=ignore_metadata,
          768                                types_mapper=types_mapper)
      
      ~/repos/arrow/python/pyarrow/table.pxi in pyarrow.lib.Table._to_pandas()
         1819                    types_mapper=None):
         1820         from pyarrow.pandas_compat import table_to_blockmanager
      -> 1821         mgr = table_to_blockmanager(
         1822             options, self, categories,
         1823             ignore_metadata=ignore_metadata,
      
      ~/repos/arrow/python/pyarrow/pandas_compat.py in table_to_blockmanager(options, table, categories, ignore_metadata, types_mapper)
          787     _check_data_column_metadata_consistency(all_columns)
          788     columns = _deserialize_column_index(table, all_columns, column_indexes)
      --> 789     blocks = _table_to_blocks(options, table, categories, ext_columns_dtypes)
          790 
          791     axes = [columns, index]
      
      ~/repos/arrow/python/pyarrow/pandas_compat.py in _table_to_blocks(options, block_table, categories, extension_columns)
         1133     # Convert an arrow table to Block from the internal pandas API
         1134     columns = block_table.column_names
      -> 1135     result = pa.lib.table_to_blocks(options, block_table, categories,
         1136                                     list(extension_columns.keys()))
         1137     return [_reconstruct_block(item, columns, extension_columns)
      
      ~/repos/arrow/python/pyarrow/table.pxi in pyarrow.lib.table_to_blocks()
         1215         c_options.categorical_columns = {tobytes(cat) for cat in categories}
         1216     if extension_columns is not None:
      -> 1217         c_options.extension_columns = {tobytes(col)
         1218                                        for col in extension_columns}
         1219 
      
      ~/repos/arrow/python/pyarrow/lib.cpython-39-darwin.so in set.from_py.__pyx_convert_unordered_set_from_py_std_3a__3a_string()
      
      ~/repos/arrow/python/pyarrow/lib.cpython-39-darwin.so in string.from_py.__pyx_convert_string_from_py_std__in_string()
      
      TypeError: expected bytes, NoneType found
      

      Attachments

        Issue Links

          Activity

            People

              alenka Alenka Frim
              alenka Alenka Frim
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2.5h
                  2.5h