Details
-
Bug
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
8.0.0
Description
Hello,
I found this odd behaviour when trying to compute an aggregate with dplyr::summarize: When I want to use a pre-defined variable to do a divison while aggregating, the execution fails with 'unsupported expression'. When I the value of the variable as is in the aggregation, it works.
See below:
library(dplyr) library(arrow) small_dataset <- tibble::tibble( ## x = rep(c("a", "b"), each = 5), y = rep(1:5, 2) ) ## convert "small_dataset" into a ...dataset tmpdir <- tempfile() dir.create(tmpdir) write_dataset(small_dataset, tmpdir) ## works open_dataset(tmpdir) %>% summarize(value = sum(y) / 10) %>% collect() ## fails scale_factor <- 10 open_dataset(tmpdir) %>% summarize(value = sum(y) / scale_factor) %>% collect() #> Fehler: Error in summarize_eval(names(exprs)[i], #> exprs[[i]], ctx, length(.data$group_by_vars) > : # Expression sum(y)/scale_factor is not an aggregate # expression or is not supported in Arrow # Call collect() first to pull data into R.
I was not sure how to name this issue/bug (if it is one), so if there is a clearer, more descriptive title you're welcome to adjust.
Thanks for your work!
Oliver
> arrow_info() Arrow package version: 8.0.0 Capabilities: dataset TRUE substrait FALSE parquet TRUE json TRUE s3 TRUE utf8proc TRUE re2 TRUE snappy TRUE gzip TRUE brotli TRUE zstd TRUE lz4 TRUE lz4_frame TRUE lzo FALSE bz2 TRUE jemalloc TRUE mimalloc TRUE Memory: Allocator jemalloc Current 64 bytes Max 41.25 Kb Runtime: SIMD Level avx2 Detected SIMD Level avx2 Build: C++ Library Version 8.0.0 C++ Compiler GNU C++ Compiler Version 12.1.0
Attachments
Issue Links
- links to