Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
0.14.1
-
Ubuntu 16.04. Pyarrow 0.14.1 installed through pip. Using Anaconda distribution of Python 3.7.
Description
When attempting to write out a pyarrow table to parquet I am observing a segfault when there is a mismatch between the schema and the datatypes.
Here is a reproducible example:
import pyarrow as pa import pyarrow.parquet as pq data = dict() data["key"] = [0, 1, 2, 3] # segfault #data["key"] = ["0", "1", "2", "3"] # no segfault schema = pa.schema({"key" : pa.string()}) table = pa.Table.from_pydict(data, schema = schema) print("now writing out test file") pq.write_table(table, "test.parquet")
This results in a segfault when writing the table. Running
gdb -ex r --args python test.py
Yields
Program received signal SIGSEGV, Segmentation fault. 0x00007fffe8173917 in virtual thunk to parquet::DictEncoderImpl<parquet::DataType<(parquet::Type::type)6> >::Put(parquet::ByteArray const*, int) () from /net/fantasia/home/jweinstk/anaconda3/lib/python3.7/site-packages/pyarrow/libparquet.so.14
Thanks for all of your arrow work,
Josh
Attachments
Issue Links
- links to