[ARROW-14176] [Python] Filename-based partitioning scheme - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Wish
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: Python
Labels:
None

External issue URL:
https://github.com/apache/arrow/issues/29763

Description

This originates from [this SO question|https://stackoverflow.com/questions/69379083/read-a-partitioned-parquet-dataset-from-multiple-files-with-pyarrow-and-add-a-pa.]

The idea is to have a portioning scheme that would allow to construct a primary key from the filename.

Let's say that one is trying to read `/data-N.parquet` where `N` is an integer. That information should go in a primary key for later reference.

This is quite similar to have the files laid-out like this : `/N/data.parquet` so I imagine this is technically feasible.

Attachments

Issue Links

is related to

ARROW-14612 [C++] Support for filename-based partitioning

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Cédric Hernalsteens

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 29/Sep/21 19:32

Updated:: 11/Jan/23 08:38