[PARQUET-1100] [C++] Reading repeated types should decode number of records rather than number of values - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: cpp-1.2.0
Fix Version/s: cpp-1.3.0
Component/s: parquet-cpp
Labels:
None

Description

Reading the attached parquet file into pandas dataframe and then using the dataframe segfaults.

Python 3.5.3 |Continuum Analytics, Inc.| (default, Mar  6 2017, 11:58:13) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 
>>> import pyarrow
>>> import pyarrow.parquet as pq
>>> pyarrow.__version__
'0.6.0'
>>> import pandas as pd
>>> pd.__version__
'0.19.0'
>>> df = pq.read_table('part-00000-6570e34b-b42c-4a39-8adf-21d3a97fb87d.snappy.parquet') \
...        .to_pandas()
>>> len(df)
69
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 69 entries, 0 to 68
Data columns (total 6 columns):
label               69 non-null int32
account_meta        69 non-null object
features_type       69 non-null int32
features_size       69 non-null int32
features_indices    1 non-null object
features_values     1 non-null object
dtypes: int32(3), object(3)
memory usage: 2.5+ KB
>>> 
>>> pd.concat([df, df])
Segmentation fault (core dumped)

Actually just print(df) is enough to trigger the segfault

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

part-00000-6570e34b-b42c-4a39-8adf-21d3a97fb87d.snappy.parquet
31/Aug/17 08:49
87 kB
Jarno Seppanen

Issue Links

links to

GitHub Pull Request #1043

Activity

People

Assignee:: Wes McKinney

Reporter:: Jarno Seppanen

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 31/Aug/17 08:50

Updated:: 05/Oct/17 13:09

Resolved:: 20/Sep/17 01:39