[ARROW-5965] [Python] Regression: segfault when reading hive table with v0.14 - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Critical
Resolution: Duplicate
Affects Version/s: 0.14.0
Fix Version/s: 0.15.0
Component/s: Python
Labels:
- parquet

External issue URL:
https://github.com/apache/arrow/issues/16812

Description

I'm working with pyarrow on a cloudera cluster (CDH 6.1.1), with pyarrow installed in a conda env.

The data I'm reading is a hive(-registered) table written as parquet, and with v0.13, reading this table (that is partitioned) does not cause any issues.

The code that worked before and now crashes with v0.14 is simply:

```
import pyarrow.parquet as pq
pq.ParquetDataset('hdfs:///data/raw/source/table').read()

```

Since it completely crashes my notebook (resp. my REPL ends with "Killed"), I cannot report much more, but this is a pretty severe usability restriction. So far the solution is to enforce `pyarrow<0.14`

Attachments

Issue Links

relates to

ARROW-2652 [C++/Python] Document how to provide information on segfaults

Open

Activity

People

Assignee:: Unassigned

Reporter:: H. Vetinari

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 17/Jul/19 08:50

Updated:: 11/Jan/23 07:43

Resolved:: 19/Aug/19 19:41