[ARROW-3208] [C++] Segmentation fault when casting dictionary to numeric with nullptr valid_bitmap - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 0.9.0
Fix Version/s: 0.13.0
Component/s: C++
Labels:
- parquet
- pull-request-available
Environment:
Ubuntu 16.04 LTS; System76 Oryx Pro

External issue URL:
https://github.com/apache/arrow/issues/19552

Description

Steps to reproduce:

Create a partitioned dataset with the following code:

```python

import numpy as np

import pandas as pd

import pyarrow as pa

import pyarrow.parquet as pq

df = pd.DataFrame(

{ 'one': [-1, 10, 2.5, 100, 1000, 1, 29.2], 'two': [-1, 10, 2, 100, 1000, 1, 11], 'three': [0, 0, 0, 0, 0, 0, 0] }

)

table = pa.Table.from_pandas(df)

pq.write_to_dataset(table, root_path='/home/yingw787/misc/example_dataset', partition_cols=['one', 'two'])

```

Create a Parquet file from a PyArrow Table created from the partitioned Parquet dataset:

```python

import pyarrow.parquet as pq

table = pq.ParquetDataset('/path/to/dataset').read()

pq.write_table(table, '/path/to/example.parquet')

```

EXPECTED:

Successful write

GOT:

Segmentation fault

Issue reference on GitHub mirror: https://github.com/apache/arrow/issues/2511

Attachments

Issue Links

links to

GitHub Pull Request #3978

Activity

People

Assignee:: Francois Saint-Jacques

Reporter:: Ying Wang

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 10/Sep/18 19:18

Updated:: 11/Jan/23 07:25

Resolved:: 20/Mar/19 08:48

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

20m