Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-13107

[R] [C++] Implement SQL-alike distinct() for dplyr queries

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • None
    • None
    • C++, R
    • None

    Description

      Hi

      It would be desirable to have the ability to obtain a data frame with the unique combinations, say

      open_dataset("sitc-rev2/parquet/",
                   partitioning = c("Year", "Trade Flow", "Reporter ISO")) %>%
        select(Year, `Reporter ISO`) %>%
        filter(Year >= 1988 & Year <= 1994) %>% 
        distinct() %>% 
        collect()
      

      However, in the current development version of the Arrow package (installed from GitHub), we get this error for the last expression

      Error in UseMethod("distinct") : 
        no applicable method for 'distinct' applied to an object of class "arrow_dplyr_query"
      

      This works

      reporters_1 <- open_dataset("sitc-rev2/parquet/",
                   partitioning = c("Year", "Trade Flow", "Reporter ISO")) %>%
        select(Year, `Reporter ISO`) %>%
        filter(Year >= 1988 & Year <= 1994) %>% 
        collect() %>% 
        distinct()
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              pachamaltese Mauricio 'PachĂĄ' Vargas SepĂșlveda
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: