Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-1998

[Python] Table.from_pandas crashes when data frame is empty

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 0.8.0
    • Fix Version/s: 0.9.0
    • Component/s: Python
    • Environment:
      Windows 10 Build 15063.850
      Python: 3.6.3
      Numpy: 1.14.0
      Pandas: 0.22.0

      Description

      Loading an empty CSV file, and then attempting to create a PyArrow Table from it makes the application crash. The following code should be able to reproduce the issue:

      import numpy as np
      import pandas as pd
      import pyarrow as pa
      
      FIELDS = ['id', 'name']
      NUMPY_TYPES = {
          'id': np.int64,
          'name': np.unicode
      }
      PYARROW_SCHEMA = pa.schema([
          pa.field('id', pa.int64()),
          pa.field('name', pa.string())
      ])
      
      file = open('input.csv', 'w')
      file.close()
      
      df = pd.read_csv(
          'input.csv',
          header=None,
          names=FIELDS,
          dtype=NUMPY_TYPES,
          engine='c',
      )
      
      pa.Table.from_pandas(df, schema=PYARROW_SCHEMA)
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                pitrou Antoine Pitrou
                Reporter:
                betabandido Victor Jimenez
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: