Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-13434

[R] group_by() with an unnammed expression

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

      Description

      With dplyr, when we group_by with an unnamed expression, a column is added to the dataframe that has the result of the expression.

      > example_data %>% 
      +   group_by(int < 4) %>% collect()
      # A tibble: 10 x 8
      # Groups:   int < 4 [3]
           int   dbl  dbl2 lgl   false chr   fct   `int < 4`
         <int> <dbl> <dbl> <lgl> <lgl> <chr> <fct> <lgl>    
       1     1   1.1     5 TRUE  FALSE a     a     TRUE     
       2     2   2.1     5 NA    FALSE b     b     TRUE     
       3     3   3.1     5 TRUE  FALSE c     c     TRUE     
       4    NA   4.1     5 FALSE FALSE d     d     NA       
       5     5   5.1     5 TRUE  FALSE e     NA    FALSE    
       6     6   6.1     5 NA    FALSE NA    NA    FALSE    
       7     7   7.1     5 NA    FALSE g     g     FALSE    
       8     8   8.1     5 FALSE FALSE h     h     FALSE    
       9     9  NA       5 FALSE FALSE i     i     FALSE    
      10    10  10.1     5 NA    FALSE j     j     FALSE    
      

      Arrow doesn't do this, however because we (currently) only add columns when the expression is named.

      > Table$create(example_data) %>% 
      +   group_by(int < 4) %>% collect()
       Error: Invalid: No match for FieldRef.Name(int < 4) in int: int32
      dbl: double
      dbl2: double
      lgl: bool
      false: bool
      chr: string
      fct: dictionary<values=string, indices=int8, ordered=0> 
      

      This isn't a big deal right now since grouped aggregations aren't (quite) here yet, but once we start having support for that, we will have people using examples like this.

        Attachments

          Activity

            People

            • Assignee:
              jonkeane Jonathan Keane
              Reporter:
              jonkeane Jonathan Keane

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 2h 10m
                2h 10m

                  Issue deployment