Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
Currently, the parquet.read_table function can be used both for reading a single file (interface to ParquetFile) as a directory (interface to ParquetDataset).
ParquetDataset has some extra keywords such as filters that would be nice to expose through read_table as well.
Of course one can always use ParquetDataset if you need its power, but for pandas wrapping pyarrow it is easier to be able to pass through keywords just to parquet.read_table instead of calling either read_table or ParquetDataset. Context: https://github.com/pandas-dev/pandas/issues/26551
Attachments
Issue Links
- is related to
-
ARROW-14772 [Python] unexpected content after groupby on a dataframe restored from partitioned parquet with filters
- Closed
- links to