Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Hi,
Some expressions, such as substr(), grepl(), str_detect() or others, are not supported while filtering a dataset (after open_datatset() ). Specifically, the code below :
library(dplyr) library(arrow) data = data.frame(a = c("a", "a2", "a3")) write_parquet(data, "Test_filter/data.parquet") ds <- open_dataset("Test_filter/") data_flt <- ds %>% filter(substr(a, 1, 1) == "a")
gives this error :
Error: Filter expression not supported for Arrow Datasets: substr(a, 1, 1) == "a" Call collect() first to pull data into R.
These expressions may be very helpful, not to say necessary, to filter and collect a very large dataset. Is there anything it can be done to implement this new feature ?
Thank you.
Attachments
Issue Links
- depends upon
-
ARROW-9856 [R] Add bindings for string compute functions
- Resolved
-
ARROW-10195 [C++] Add string struct extract kernel using re2
- Resolved
-
ARROW-10557 [C++] Add scalar string slicing/substring extract kernel
- Resolved