[ARROW-8799] [C++][Dataset] Reading list column as nested dictionary segfaults - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.0.0
Component/s: C++
Labels:
- dataset
- pull-request-available

External issue URL:
https://github.com/apache/arrow/issues/24944

Description

Python example:

import pyarrow as pa
import pyarrow.parquet as pq  
from pyarrow.tests import util                                                                                                                                                                             
                                                                                                                                                                      
repeats = 10 
nunique = 5 

data = [ 
    [[util.rands(10)] for i in range(nunique)] * repeats, 
] 
table = pa.table(data, names=['f0'])                                                                                                                                                                   

pq.write_table(table, "test_dictionary.parquet")

Reading with the parquet code works:

>>> pq.read_table("test_dictionary.parquet", read_dictionary=['f0.list.item'])                                                                                                                                  
pyarrow.Table
f0: list<item: dictionary<values=string, indices=int32, ordered=0>>
  child 0, item: dictionary<values=string, indices=int32, ordered=0>

but doing the same with the datasets API segfaults:

>>> fmt = ds.ParquetFileFormat(read_options=dict(dictionary_columns=["f0.list.item"]))
>>> dataset = ds.dataset("test_dictionary.parquet", format=fmt)                                                                       
>>> dataset.to_table()      
Segmentation fault (core dumped)

Attachments

Issue Links

links to

GitHub Pull Request #7181

Activity

People

Assignee:: Ben Kietzman

Reporter:: Joris Van den Bossche

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 14/May/20 13:47

Updated:: 11/Jan/23 08:02

Resolved:: 08/Jun/20 17:15

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

1h 10m