Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
Python example:
import pyarrow as pa import pyarrow.parquet as pq from pyarrow.tests import util repeats = 10 nunique = 5 data = [ [[util.rands(10)] for i in range(nunique)] * repeats, ] table = pa.table(data, names=['f0']) pq.write_table(table, "test_dictionary.parquet")
Reading with the parquet code works:
>>> pq.read_table("test_dictionary.parquet", read_dictionary=['f0.list.item']) pyarrow.Table f0: list<item: dictionary<values=string, indices=int32, ordered=0>> child 0, item: dictionary<values=string, indices=int32, ordered=0>
but doing the same with the datasets API segfaults:
>>> fmt = ds.ParquetFileFormat(read_options=dict(dictionary_columns=["f0.list.item"])) >>> dataset = ds.dataset("test_dictionary.parquet", format=fmt) >>> dataset.to_table() Segmentation fault (core dumped)
Attachments
Issue Links
- links to