[ARROW-13434] [R] group_by() with an unnammed expression - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 6.0.0
Component/s: R
Labels:
- pull-request-available
- query-engine

External issue URL:
https://github.com/apache/arrow/issues/29101

Description

With dplyr, when we group_by with an unnamed expression, a column is added to the dataframe that has the result of the expression.

> example_data %>% 
+   group_by(int < 4) %>% collect()
# A tibble: 10 x 8
# Groups:   int < 4 [3]
     int   dbl  dbl2 lgl   false chr   fct   `int < 4`
   <int> <dbl> <dbl> <lgl> <lgl> <chr> <fct> <lgl>    
 1     1   1.1     5 TRUE  FALSE a     a     TRUE     
 2     2   2.1     5 NA    FALSE b     b     TRUE     
 3     3   3.1     5 TRUE  FALSE c     c     TRUE     
 4    NA   4.1     5 FALSE FALSE d     d     NA       
 5     5   5.1     5 TRUE  FALSE e     NA    FALSE    
 6     6   6.1     5 NA    FALSE NA    NA    FALSE    
 7     7   7.1     5 NA    FALSE g     g     FALSE    
 8     8   8.1     5 FALSE FALSE h     h     FALSE    
 9     9  NA       5 FALSE FALSE i     i     FALSE    
10    10  10.1     5 NA    FALSE j     j     FALSE

Arrow doesn't do this, however because we (currently) only add columns when the expression is named.

> Table$create(example_data) %>% 
+   group_by(int < 4) %>% collect()
 Error: Invalid: No match for FieldRef.Name(int < 4) in int: int32
dbl: double
dbl2: double
lgl: bool
false: bool
chr: string
fct: dictionary<values=string, indices=int8, ordered=0>

This isn't a big deal right now since grouped aggregations aren't (quite) here yet, but once we start having support for that, we will have people using examples like this.

Attachments

Issue Links

links to

GitHub Pull Request #10785

Activity

People

Assignee:: Jonathan Keane

Reporter:: Jonathan Keane

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 22/Jul/21 13:45

Updated:: 11/Jan/23 08:33

Resolved:: 24/Jul/21 14:17

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

2h 10m