Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-13694

[R] Arrow filter crashes (R aborted session)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 5.0.0
    • 6.0.0
    • None
    • None

    Description

      Hi,

       

      I encounter a fatal error with the new version of Arrow R (5.0.0) that I did not have with its older version (4.0.1). Basically, after running "open_dataset", I filter and collect the data into a dataframe; then RStudio crashes :

       

      ds <- arrow::open_dataset(sources = "XXXX", partitioning = c("XX","YY","ZZ"))
      df<- ds %>%
       filter(year >= 2014 & year <= 2020 & type %in% c("XX", "YY") & sector == "ABC" & identifier %in% list_identifiers & type == "LE" & val == "M") %>%
       select(period, obs_value) %>%
      collect()
      

       

      If I run the code above without "filter", I do not have any problem. I guess there is something wrong in the filtering expression.

       

      Unfortunately, I cannot reproduce the exact code neither the problem. The dataset is very large and I did not understand the precise source of the error. Eveything I know is that my R Studio crashes and that this code worked perfectly in the older version of the package.

      Also, please note that I disabled multithreading with :

      options(arrow.use_threads = FALSE)

       

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            palgal Pal
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: