Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-10643

[Python] Pandas<->pyarrow roundtrip failing to recreate index for empty dataframe

    XMLWordPrintableJSON

Details

    Description

      From https://github.com/pandas-dev/pandas/issues/37897

      The roundtrip of an empty pandas.DataFrame with and index (so no columns, but a non-zero shape for the rows) isn't faithful:

      In [33]: df = pd.DataFrame(index=pd.RangeIndex(0, 10, 1))
      
      In [34]: df
      Out[34]: 
      Empty DataFrame
      Columns: []
      Index: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
      
      In [35]: df.shape
      Out[35]: (10, 0)
      
      In [36]: table = pa.table(df)
      
      In [37]: table.to_pandas()
      Out[37]: 
      Empty DataFrame
      Columns: []
      Index: []
      
      In [38]: table.to_pandas().shape
      Out[38]: (0, 0)
      

      Since the pandas metadata in the Table actually have this RangeIndex information:

      In [39]: table.schema.pandas_metadata
      Out[39]: 
      {'index_columns': [{'kind': 'range',
         'name': None,
         'start': 0,
         'stop': 10,
         'step': 1}],
       'column_indexes': [{'name': None,
         'field_name': None,
         'pandas_type': 'empty',
         'numpy_type': 'object',
         'metadata': None}],
       'columns': [],
       'creator': {'library': 'pyarrow', 'version': '3.0.0.dev162+g305160495'},
       'pandas_version': '1.2.0.dev0+1225.g91f5bfcdc4'}
      

      we should in principle be able to correctly roundtrip this case.

      Attachments

        Activity

          People

            alenka Alenka Frim
            jorisvandenbossche Joris Van den Bossche
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 5h 10m
                5h 10m