[ARROW-7547] [C++] [Python] [Dataset] Additional reader options in ParquetFileFormat - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 0.17.0
Component/s: C++, Python
Labels:
- dataset
- pull-request-available

External issue URL:
https://github.com/apache/arrow/issues/17017

Description

[looking into using the datasets machinery in the current python parquet code]

In the current python API, we expose several options that influence reading the parquet file (eg read_dictionary to indicate to read certain BYTE_ARRAY columns directly into a dictionary type, or memory_map, buffer_size).

Those could be added to ParquetFileFormat.

Attachments

Issue Links

links to

GitHub Pull Request #6235

Activity

People

Assignee:: Ben Kietzman

Reporter:: Joris Van den Bossche

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 10/Jan/20 10:43

Updated:: 11/Jan/23 07:54

Resolved:: 23/Feb/20 13:13

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

9h 10m