Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-5353

0-row table can be written but not read

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Invalid
    • 0.11.0, 0.12.0, 0.13.0
    • None
    • C++, Python
    • None

    Description

      I can serialize a table with 0 rows, but not read it. The following code

      import pandas as pd
      import pyarrow as pa
      
      df = pd.DataFrame({'x': [0,1,2]})[:0]
      fnm = "tbl.arr"
      
      tbl = pa.Table.from_pandas(df)
      print(tbl.schema)
      
      writer = pa.RecordBatchStreamWriter(fnm, tbl.schema)
      writer.write_table(tbl)
      
      reader = pa.RecordBatchStreamReader(fnm)
      tbl2 = reader.read_all()
      

      ...results in the following output:

      x: int64
      metadata
      --------
      OrderedDict([(b'pandas',
                    b'{"index_columns": [{"kind": "range", "name": null, "start": '
                    b'0, "stop": 0, "step": 1}], "column_indexes": [{"name": null,'
                    b' "field_name": null, "pandas_type": "unicode", "numpy_type":'
                    b' "object", "metadata": {"encoding": "UTF-8"}}], "columns": ['
                    b'{"name": "x", "field_name": "x", "pandas_type": "int64", "nu'
                    b'mpy_type": "int64", "metadata": null}], "creator": {"library'
                    b'": "pyarrow", "version": "0.13.0"}, "pandas_version": null}')])
      ---------------------------------------------------------------------------
      ArrowInvalid                              Traceback (most recent call last)
      <ipython-input-3-8869ad9b37c6> in <module>
           11 writer.write_table(tbl)
           12 
      ---> 13 reader = pa.RecordBatchStreamReader(fnm)
           14 tbl2 = reader.read_all()
      
      ~/anaconda/envs/grapy/lib/python3.6/site-packages/pyarrow/ipc.py in __init__(self, source)
           56     """
           57     def __init__(self, source):
      ---> 58         self._open(source)
           59 
           60 
      
      ~/anaconda/envs/grapy/lib/python3.6/site-packages/pyarrow/ipc.pxi in pyarrow.lib._RecordBatchStreamReader._open()
      
      ~/anaconda/envs/grapy/lib/python3.6/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status()
      
      ArrowInvalid: Expected schema message in stream, was null or length 0
      

      Since the schema should be sufficient to build a table, even though it may not have any actual data, I wouldn't expect this to fail but return the same 0-row input table.

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            buhrmann Thomas Buhrmann
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: