Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-10305

[C++][R] Filter datasets with string expressions

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: C++, R
    • Labels:
      None

      Description

      Hi,

      Some expressions, such as substr(), grepl(), str_detect() or others, are not supported while filtering a dataset (after open_datatset() ). Specifically, the code below :

      library(dplyr)
      library(arrow)
      data = data.frame(a = c("a", "a2", "a3"))
      write_parquet(data, "Test_filter/data.parquet")
      ds <- open_dataset("Test_filter/")
      data_flt <- ds %>% 
       filter(substr(a, 1, 1) == "a")
      

      gives this error :

      Error: Filter expression not supported for Arrow Datasets: substr(a, 1, 1) == "a"
       Call collect() first to pull data into R.

      These expressions may be very helpful, not to say necessary, to filter and collect a very large dataset. Is there anything it can be done to implement this new feature ?

      Thank you.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              palgal Pal
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: