[ARROW-8039] [Python][Dataset] Support using dataset API in pyarrow.parquet with a minimal ParquetDataset shim - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 0.16.0
Fix Version/s: 0.17.0
Component/s: C++, Python
Labels:
- dataset
- pull-request-available

External issue URL:
https://github.com/apache/arrow/issues/17077

Description

Assemble a minimal ParquetDataset shim backed by pyarrow.dataset.*. Replace the existing ParquetDataset with the shim by default, allow opt-out for users who need the current ParquetDataset

This is mostly exploratory to see which of the python tests fail

Attachments

Issue Links

is depended upon by

ARROW-2659 [Python] More graceful reading of empty String columns in ParquetDataset

Open

ARROW-2860 [Python][Parquet][C++] Null values in a single partition of Parquet dataset, results in invalid schema on read

Open

ARROW-6114 [Python] Datatypes not preserved for partition fields in roundtrip to partitioned parquet dataset

Open

ARROW-3861 [Python] ParquetDataset().read columns argument always returns partition column

Resolved

ARROW-5666 [Python] Underscores in partition (string) values are dropped when reading dataset

Resolved

ARROW-5310 [Python] better error message on creating ParquetDataset from empty directory

Resolved

ARROW-5572 [Python] raise error message when passing invalid filter in parquet reading

Resolved

ARROW-2882 [C++][Python] Support AWS Firehose partition_scheme implementation for Parquet datasets

Resolved

ARROW-3388 [C++][Dataset] Automatically detect boolean partition columns

Open

ARROW-2366 [Python][C++][Parquet] Support reading Parquet files having a permutation of column order

Resolved

ARROW-3424 [Python] Improved workflow for loading an arbitrary collection of Parquet files

Resolved

ARROW-1796 [Python] RowGroup filtering on file level

Closed

links to

GitHub Pull Request #6303

(7 is depended upon by, 1 links to)

Activity

People

Assignee:: Joris Van den Bossche

Reporter:: Ben Kietzman

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 09/Mar/20 16:43

Updated:: 11/Jan/23 07:57

Resolved:: 09/Apr/20 13:58

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

8h 40m