Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
From https://github.com/pandas-dev/pandas/issues/37897
The roundtrip of an empty pandas.DataFrame with and index (so no columns, but a non-zero shape for the rows) isn't faithful:
In [33]: df = pd.DataFrame(index=pd.RangeIndex(0, 10, 1)) In [34]: df Out[34]: Empty DataFrame Columns: [] Index: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] In [35]: df.shape Out[35]: (10, 0) In [36]: table = pa.table(df) In [37]: table.to_pandas() Out[37]: Empty DataFrame Columns: [] Index: [] In [38]: table.to_pandas().shape Out[38]: (0, 0)
Since the pandas metadata in the Table actually have this RangeIndex information:
In [39]: table.schema.pandas_metadata Out[39]: {'index_columns': [{'kind': 'range', 'name': None, 'start': 0, 'stop': 10, 'step': 1}], 'column_indexes': [{'name': None, 'field_name': None, 'pandas_type': 'empty', 'numpy_type': 'object', 'metadata': None}], 'columns': [], 'creator': {'library': 'pyarrow', 'version': '3.0.0.dev162+g305160495'}, 'pandas_version': '1.2.0.dev0+1225.g91f5bfcdc4'}
we should in principle be able to correctly roundtrip this case.