Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
Description
Currently, the filters argument supports List[{{{}Tuple{}}}] or List[List[{{{}Tuple{}}}]] or None as its input types. I was suprised to see that Expressions were not supported, considering that filters are converted to expressions internally when using use_legacy_dataset=False.
The check on L150-L153 short-circuits and succeeds when encountering an expression, but later fails on L2343 as the expression is evaluated as part of a boolean expression.
I think declaring filters using pa.compute.Expressions more pythonic and less error-prone, and ill-formed filters will be detected much earlier than when using list-of-tuple-of-string equivalents.
Example:
import pyarrow as pa import pyarrow.compute as pc import pyarrow.parquet as pq # Creating a dummy table table = pa.table({ 'year': [2020, 2022, 2021, 2022, 2019, 2021], 'n_legs': [2, 2, 4, 4, 5, 100], 'animal': ["Flamingo", "Parrot", "Dog", "Horse", "Brittle stars", "Centipede"] }) pq.write_to_dataset(table, root_path='dataset_name_2', partition_cols=['year']) # Reading using 'pyarrow.compute.Expression' pq.read_table('dataset_name_2', columns=["n_legs", "animal"], filters=pc.field("n_legs") < 4) # Reading using List[Tuple] pq.read_table('dataset_name_2', columns=["n_legs", "animal"], filters=[('n_legs', '<', 4)])
Attachments
Issue Links
- links to