Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
6.0.1
-
R 4.1.2 on Windows
arrow 6.0.1
dplyr 1.0.7
Description
Hi !
I just found an issue when querying an Arrow dataset with dplyr, filtering on is.na(...)
It seems linked to columns containing only one distinct value and some NA's.
Can you also reproduce the following?
library(arrow)
library(dplyr)
ds_path = "test-arrow-na"
df = tibble(x=1:3, y=c(0L, 0L, NA_integer_), z=c(0L, 1L, NA_integer_))
df %>% arrow::write_dataset(ds_path)
# OK: Collect then filter: returns row 3, as expected
arrow::open_dataset(ds_path) %>% collect() %>% filter(is.na(y))
# ERROR: Filter then collect (on y) returns a tibble with no row
arrow::open_dataset(ds_path) %>% filter(is.na(y)) %>% collect()
# OK: Filter then collect (on z) returns row 3, as expected
arrow::open_dataset(ds_path) %>% filter(is.na(z)) %>% collect()
Thanks
Pierre
Attachments
Issue Links
- relates to
-
ARROW-14725 [C++][Compute] Extract Expression simplification passes to an extensible registry
- Open
-
ARROW-12659 [C++][Compute] Support SimplifyWithGuarantee(is_null(foo), is_valid(foo))
- Resolved
- links to