Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
7.0.0
-
None
Description
Passing an expression filter with `is_null()` doesn't properly remove null values, when computing row counts. I have reproduced this with both strings and integer. Here is a reproducer.
df = pd.DataFrame({"C": pd.array([None, None, 1], dtype=pd.Int64Dtype())}) print(df) df.to_parquet("test.pq") # Create a dataset dataset = ds.dataset("test.pq") fragments = [f for f in dataset.get_fragments()] #There should just be 1 fragment. fragment = fragments[0] # Get the null row count expr = ds.field("C").is_null() scanner = fragment.scanner(filter=expr) print(scanner.count_rows())
I expect this print 2 as there are 2 NULL values.