[ARROW-2406] [Python] Segfault when creating PyArrow table from Pandas for empty string column when schema provided - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 0.8.0
Fix Version/s: 0.9.0
Component/s: Python
Labels:
None
Environment:
Mac OS High Sierra
Python 3.6.3

External issue URL:
https://github.com/apache/arrow/issues/18470

Description

Minimal example to recreate:

import pandas as pd
import pyarrow as pa

df = pd.DataFrame({'a': []})
df['a'] = df['a'].astype(str)
schema = pa.schema([pa.field('a', pa.string())])
pa.Table.from_pandas(df, schema=schema)

This causes the python interpreter to exit with "Segmentation fault: 11".

The following examples all work without any issue:

# column 'a' is no longer empty
df = pd.DataFrame({'a': ['foo']})
df['a'] = df['a'].astype(str)
schema = pa.schema([pa.field('a', pa.string())])
pa.Table.from_pandas(df, schema=schema)

# column 'a' is empty, but no schema is specified
df = pd.DataFrame({'a': []})
df['a'] = df['a'].astype(str)
pa.Table.from_pandas(df)

# column 'a' is empty, but no type 'str' specified in Pandas
df = pd.DataFrame({'a': []})
schema = pa.schema([pa.field('a', pa.string())])
pa.Table.from_pandas(df, schema=schema)

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Dave Challis

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 06/Apr/18 11:40

Updated:: 11/Jan/23 07:20

Resolved:: 09/Apr/18 08:36