Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-13761

[R] arrow::filter() crashes (aborts R session)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 5.0.0
    • 6.0.0
    • R

    Description

      Arrow crashes (aborts R session) when attempting to evaluate `filter` with a `collect()` command, e.g. following arrow's dplyr vignette: https://cran.r-project.org/web/packages/arrow/vignettes/dataset.html

      ```r
      library(arrow)
      library(dplyr)

      ds <- open_dataset("nyc-taxi", partitioning = c("year", "month"))
      x <- ds %>%
      filter(total_amount > 100, year == 2015)
      x %>% collect() # crashes R

      ```

      (Note for simplicity I downloaded only years 2009 and 2010 using the R loop you provide in the Vignette.

      I observe this behavior in a RStudio server instance on a Ubuntu 20.04 Linux server with 128 cores and 256 GB RAM.

      Here's my sessionInfo():

      ```r
      sessionInfo()
      R version 4.1.0 (2021-05-18)
      Platform: x86_64-pc-linux-gnu (64-bit)
      Running under: Ubuntu 20.04.2 LTS

      Matrix products: default
      BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.8.so

      locale:
      [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
      [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
      [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=C
      [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
      [9] LC_ADDRESS=C LC_TELEPHONE=C
      [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

      attached base packages:
      [1] stats graphics grDevices utils datasets methods base

      other attached packages:
      [1] dplyr_1.0.7 arrow_5.0.0

      loaded via a namespace (and not attached):
      [1] fansi_0.5.0 crayon_1.4.1 utf8_1.2.2 assertthat_0.2.1
      [5] R6_2.5.1 DBI_1.1.1 lifecycle_1.0.0 magrittr_2.0.1
      [9] pillar_1.6.2 rlang_0.4.11 vctrs_0.3.8 generics_0.1.0
      [13] ellipsis_0.3.2 tools_4.1.0 bit64_4.0.5 glue_1.4.2
      [17] purrr_0.3.4 bit_4.0.4 compiler_4.1.0 pkgconfig_2.0.3
      [21] tidyselect_1.1.1 tibble_3.1.3
      ```

      Attachments

        Issue Links

          Activity

            People

              westonpace Weston Pace
              cboettig Carl Boettiger
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 20m
                  1h 20m