Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-10305

[R] Filter with regular expressions

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 4.0.0
    • R
    • None

    Description

      Hi,

      Some expressions, such as substr(), grepl(), str_detect() or others, are not supported while filtering a dataset (after open_datatset() ). Specifically, the code below :

      library(dplyr)
      library(arrow)
      data = data.frame(a = c("a", "a2", "a3"))
      write_parquet(data, "Test_filter/data.parquet")
      ds <- open_dataset("Test_filter/")
      data_flt <- ds %>% 
       filter(substr(a, 1, 1) == "a")
      

      gives this error :

      Error: Filter expression not supported for Arrow Datasets: substr(a, 1, 1) == "a"
       Call collect() first to pull data into R.

      These expressions may be very helpful, not to say necessary, to filter and collect a very large dataset. Is there anything it can be done to implement this new feature ?

      Thank you.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              palgal Pal
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: