Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
5.0.0
Description
Arrow crashes (aborts R session) when attempting to evaluate `filter` with a `collect()` command, e.g. following arrow's dplyr vignette: https://cran.r-project.org/web/packages/arrow/vignettes/dataset.html
```r
library(arrow)
library(dplyr)
ds <- open_dataset("nyc-taxi", partitioning = c("year", "month"))
x <- ds %>%
filter(total_amount > 100, year == 2015)
x %>% collect() # crashes R
```
(Note for simplicity I downloaded only years 2009 and 2010 using the R loop you provide in the Vignette.
I observe this behavior in a RStudio server instance on a Ubuntu 20.04 Linux server with 128 cores and 256 GB RAM.
Here's my sessionInfo():
```r
sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS
Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.8.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=C
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] dplyr_1.0.7 arrow_5.0.0
loaded via a namespace (and not attached):
[1] fansi_0.5.0 crayon_1.4.1 utf8_1.2.2 assertthat_0.2.1
[5] R6_2.5.1 DBI_1.1.1 lifecycle_1.0.0 magrittr_2.0.1
[9] pillar_1.6.2 rlang_0.4.11 vctrs_0.3.8 generics_0.1.0
[13] ellipsis_0.3.2 tools_4.1.0 bit64_4.0.5 glue_1.4.2
[17] purrr_0.3.4 bit_4.0.4 compiler_4.1.0 pkgconfig_2.0.3
[21] tidyselect_1.1.1 tibble_3.1.3
```
Attachments
Issue Links
- is duplicated by
-
ARROW-14434 R crashes when making an empty selection for Datasets with DateTime
- Closed
-
ARROW-14307 [R] crashes when reading empty feather with POSIXct column
- Closed
- relates to
-
ARROW-15166 [C++] Implement an array_filter kernel for decimal256
- Resolved
- links to