Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-15370

[Python] Regression in empty table to_pandas conversion

    XMLWordPrintableJSON

Details

    Description

      Nightly integration tests with kartothek are failing, see eg https://github.com/ursacomputing/crossbow/runs/4863725914?check_suite_focus=true

      This seems something on our side, and a recent failure (the builds only started failing today, and I don't see other differences with the last working build yesterday)

      Update, a reproducer:

      In [4]: df = pd.DataFrame({'a': [1, 2], 'b': [0.1, 0.2]})
      
      In [5]: table = pa.table(df)
      
      In [6]: table.schema.empty_table().to_pandas()
      ---------------------------------------------------------------------------
      ValueError                                Traceback (most recent call last)
      <ipython-input-6-a03ecffc0af8> in <module>
      ----> 1 table.schema.empty_table().to_pandas()
      
      ~/scipy/repos/arrow/python/pyarrow/array.pxi in pyarrow.lib._PandasConvertible.to_pandas()
      
      ~/scipy/repos/arrow/python/pyarrow/table.pxi in pyarrow.lib.Table._to_pandas()
      
      ~/scipy/repos/arrow/python/pyarrow/pandas_compat.py in table_to_blockmanager(options, table, categories, ignore_metadata, types_mapper)
          790 
          791     axes = [columns, index]
      --> 792     return BlockManager(blocks, axes)
          793 
          794 
      
      ~/miniconda3/envs/arrow-dev/lib/python3.8/site-packages/pandas/core/internals/managers.py in __init__(self, blocks, axes, verify_integrity)
          912                         pass
          913 
      --> 914             self._verify_integrity()
          915 
          916     def _verify_integrity(self) -> None:
      
      ~/miniconda3/envs/arrow-dev/lib/python3.8/site-packages/pandas/core/internals/managers.py in _verify_integrity(self)
          919         for block in self.blocks:
          920             if block.shape[1:] != mgr_shape[1:]:
      --> 921                 raise construction_error(tot_items, block.shape[1:], self.axes)
          922         if len(self.items) != tot_items:
          923             raise AssertionError(
      
      ValueError: Empty data passed with indices specified.
      

      It happens specifically if the schema still has pandas metadata that indicate a range for the index (which we try to recreate, but that doesn't match the actual length of the table).

      Attachments

        Issue Links

          Activity

            People

              jorisvandenbossche Joris Van den Bossche
              jorisvandenbossche Joris Van den Bossche
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 40m
                  1h 40m