Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-5952

[Python] Segfault when reading empty table with category as pandas dataframe

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.14.0, 0.14.1
    • Fix Version/s: 0.15.0
    • Component/s: Python
    • Environment:
      Linux 3.10.0-327.36.3.el7.x86_64
      Python 3.6.8
      Pandas 0.24.2
      Pyarrow 0.14.0

      Description

      I have two short sample programs which demonstrate the issue:

      import pyarrow as pa
      import pandas as pd
      empty = pd.DataFrame({'foo':[]},dtype='category')
      table = pa.Table.from_pandas(empty)
      outfile = pa.output_stream('bar')
      writer = pa.RecordBatchFileWriter(outfile,table.schema)
      writer.write(table)
      writer.close()
      
      import pyarrow as pa
      pa.ipc.open_file('bar').read_pandas()
      Segmentation fault
      

      My apologies if this was already reported elsewhere, I searched but could not find an issue which seemed to refer to the same behavior.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                jorisvandenbossche Joris Van den Bossche
                Reporter:
                nugend Daniel Nugent
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h
                  1h