Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-16641

[R] How to filter array columns?

    XMLWordPrintableJSON

Details

    • Wish
    • Status: Closed
    • Minor
    • Resolution: Information Provided
    • None
    • None
    • R
    • None

    Description

      In the parquet data we have, there is a column with the array data type (list<array_element <string>>), which flags records that have different issues. For each record, multiple values could be stored in the column. For example, `[A, B, C]`.

      I'm trying to perform a data filtering step and exclude some flagged records.

      Filtering is trivial for the regular columns that contain just a single value. E.g.,

      flags_to_exclude <- c("A", "B")
      datt %>% filter(! col %in% flags_to_exclude)
      

      Given the array column, is it possible to exclude records with at least one of the flags from `flags_to_exclude` using the arrow R package?

      I really appreciate any advice you can provide!

      Attachments

        Issue Links

          Activity

            People

              wjones127 Will Jones
              wjones127 Will Jones
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: