Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-5353

0-row table can be written but not read

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Invalid
    • Affects Version/s: 0.11.0, 0.12.0, 0.13.0
    • Fix Version/s: None
    • Component/s: C++, Python
    • Labels:
      None

      Description

      I can serialize a table with 0 rows, but not read it. The following code

      import pandas as pd
      import pyarrow as pa
      
      df = pd.DataFrame({'x': [0,1,2]})[:0]
      fnm = "tbl.arr"
      
      tbl = pa.Table.from_pandas(df)
      print(tbl.schema)
      
      writer = pa.RecordBatchStreamWriter(fnm, tbl.schema)
      writer.write_table(tbl)
      
      reader = pa.RecordBatchStreamReader(fnm)
      tbl2 = reader.read_all()
      

      ...results in the following output:

      x: int64
      metadata
      --------
      OrderedDict([(b'pandas',
                    b'{"index_columns": [{"kind": "range", "name": null, "start": '
                    b'0, "stop": 0, "step": 1}], "column_indexes": [{"name": null,'
                    b' "field_name": null, "pandas_type": "unicode", "numpy_type":'
                    b' "object", "metadata": {"encoding": "UTF-8"}}], "columns": ['
                    b'{"name": "x", "field_name": "x", "pandas_type": "int64", "nu'
                    b'mpy_type": "int64", "metadata": null}], "creator": {"library'
                    b'": "pyarrow", "version": "0.13.0"}, "pandas_version": null}')])
      ---------------------------------------------------------------------------
      ArrowInvalid                              Traceback (most recent call last)
      <ipython-input-3-8869ad9b37c6> in <module>
           11 writer.write_table(tbl)
           12 
      ---> 13 reader = pa.RecordBatchStreamReader(fnm)
           14 tbl2 = reader.read_all()
      
      ~/anaconda/envs/grapy/lib/python3.6/site-packages/pyarrow/ipc.py in __init__(self, source)
           56     """
           57     def __init__(self, source):
      ---> 58         self._open(source)
           59 
           60 
      
      ~/anaconda/envs/grapy/lib/python3.6/site-packages/pyarrow/ipc.pxi in pyarrow.lib._RecordBatchStreamReader._open()
      
      ~/anaconda/envs/grapy/lib/python3.6/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status()
      
      ArrowInvalid: Expected schema message in stream, was null or length 0
      

      Since the schema should be sufficient to build a table, even though it may not have any actual data, I wouldn't expect this to fail but return the same 0-row input table.

       

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              buhrmann Thomas Buhrmann
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: