[ARROW-5030] [Python] read_row_group fails with Nested data conversions not implemented for chunked array outputs - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 0.12.0
Fix Version/s: None
Component/s: C++, Python
Labels:
- parquet

External issue URL:
https://github.com/apache/arrow/issues/21526

Description

Hey, I'm trying to concatenate two files and to avoid reading everything to memory at once, I wanted to use `read_row_group` for my solution, but it fails.

I think it's due to fields like these:

pyarrow.Field<to: list<item: string>>

But I'm not sure. Is this a duplicate? The issue linked in the code is resolved https://github.com/apache/arrow/blob/fd0b90a7f7e65fde32af04c4746004a1240914cf/cpp/src/parquet/arrow/reader.cc#L915

Stacktrace is

File "/data/teftel/teftel-data/teftel_data/parquet_stream.py", line 163, in read_batches
table = pf.read_row_group(ix, columns=self._columns)
File "/home/kuba/.local/share/virtualenvs/teftel-o6G5iH_l/lib/python3.6/site-packages/pyarrow/parquet.py", line 186, in read_row_group
use_threads=use_threads)
File "pyarrow/_parquet.pyx", line 695, in pyarrow._parquet.ParquetReader.read_row_group
File "pyarrow/error.pxi", line 89, in pyarrow.lib.check_status
pyarrow.lib.ArrowNotImplementedError: Nested data conversions not implemented for chunked array outputs

Attachments

Issue Links

is duplicated by

ARROW-17459 [C++] Support nested data conversions for chunked array

Open

relates to

ARROW-4688 [C++][Parquet] 16MB limit on (nested) column chunk prevents tuning row_group_size

Resolved

ARROW-3762 [C++] Parquet arrow::Table reads error when overflowing capacity of BinaryArray

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Jakub Okoński

Votes:: 1 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 27/Mar/19 14:34

Updated:: 11/Jan/23 07:37