Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
0.16.0
Description
Assemble a minimal ParquetDataset shim backed by pyarrow.dataset.*. Replace the existing ParquetDataset with the shim by default, allow opt-out for users who need the current ParquetDataset
This is mostly exploratory to see which of the python tests fail
Attachments
Issue Links
- is depended upon by
-
ARROW-2659 [Python] More graceful reading of empty String columns in ParquetDataset
- Open
-
ARROW-2860 [Python][Parquet][C++] Null values in a single partition of Parquet dataset, results in invalid schema on read
- Open
-
ARROW-6114 [Python] Datatypes not preserved for partition fields in roundtrip to partitioned parquet dataset
- Open
-
ARROW-3861 [Python] ParquetDataset().read columns argument always returns partition column
- Resolved
-
ARROW-5666 [Python] Underscores in partition (string) values are dropped when reading dataset
- Resolved
-
ARROW-5310 [Python] better error message on creating ParquetDataset from empty directory
- Resolved
-
ARROW-5572 [Python] raise error message when passing invalid filter in parquet reading
- Resolved
-
ARROW-2882 [C++][Python] Support AWS Firehose partition_scheme implementation for Parquet datasets
- Resolved
-
ARROW-3388 [C++][Dataset] Automatically detect boolean partition columns
- Open
-
ARROW-2366 [Python][C++][Parquet] Support reading Parquet files having a permutation of column order
- Resolved
-
ARROW-3424 [Python] Improved workflow for loading an arbitrary collection of Parquet files
- Resolved
-
ARROW-1796 [Python] RowGroup filtering on file level
- Closed
- links to