Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-17002

[R] dplyr queries create locks on FileSystemDataset files

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 8.0.0
    • None
    • R
    • None

    Description

      I think that dplyr queries on FileSystemDataset objects will create locks that persist unnecessarily. This issue only seems to occur on Windows. I'm using Windows 10. Calling the garbage collector after the dplyr query seems to release the lock.

      library(arrow)
      library(dplyr)
      
      # I can delete an arrow dataset that has been opened
      write_dataset(iris, "iris")
      ds <- open_dataset("iris")
      file.exists("iris")
      #> [1] TRUE
      print(unlink("iris", recursive = T))
      #> [1] 0
      file.exists("iris")
      #> [1] FALSE
      
      # However if I run a dplyr query on the data before deleting it the file is locked.
      write_dataset(iris, "iris")
      ds <- open_dataset("iris")
      file.exists("iris")
      #> [1] TRUE
      
      # I think this adds a lock that is not removed
      ds %>% count() %>% collect()
      #> # A tibble: 1 x 1
      #>       n
      #>   <int>
      #> 1   150
      
      print(unlink("iris", recursive = T))
      #> [1] 1
      file.exists("iris")
      #> [1] TRUE
      print(unlink("iris", recursive = T, force = T))
      #> [1] 1
      file.exists("iris")
      #> [1] TRUE
      file.remove("iris/part-0.parquet")
      #> Warning in file.remove("iris/part-0.parquet"): cannot remove file 'iris/
      #> part-0.parquet', reason 'Permission denied'
      #> [1] FALSE
      
      # running gc() will clean up the lock and allow the file to be deleted
      gc()
      #>           used (Mb) gc trigger  (Mb) max used (Mb)
      #> Ncells 1178845   63    2349652 125.5  1656436 88.5
      #> Vcells 2093715   16    8388608  64.0  3170844 24.2
      print(unlink("iris", recursive = T))
      #> [1] 0
      file.exists("iris")
      #> [1] FALSE
      
      sessioninfo::session_info()
      #> - Session info ---------------------------------------------------------------
      #>  setting  value                       
      #>  version  R version 4.0.5 (2021-03-31)
      #>  os       Windows 10 x64              
      #>  system   x86_64, mingw32             
      #>  ui       RTerm                       
      #>  language (EN)                        
      #>  collate  English_United States.1252  
      #>  ctype    English_United States.1252  
      #>  tz       America/New_York            
      #>  date     2022-07-07                  
      #> 
      #> - Packages -------------------------------------------------------------------
      #>  package     * version date       lib source        
      #>  arrow       * 8.0.0   2022-05-09 [1] CRAN (R 4.0.5)
      #>  assertthat    0.2.1   2019-03-21 [1] CRAN (R 4.0.5)
      #>  backports     1.4.0   2021-11-23 [1] CRAN (R 4.0.5)
      #>  bit           4.0.4   2020-08-04 [1] CRAN (R 4.0.5)
      #>  bit64         4.0.5   2020-08-30 [1] CRAN (R 4.0.5)
      #>  cli           3.0.1   2021-07-17 [1] CRAN (R 4.0.5)
      #>  crayon        1.5.1   2022-03-26 [1] CRAN (R 4.0.5)
      #>  DBI           1.1.2   2021-12-20 [1] CRAN (R 4.0.5)
      #>  digest        0.6.27  2020-10-24 [1] CRAN (R 4.0.5)
      #>  dplyr       * 1.0.8   2022-02-08 [1] CRAN (R 4.0.5)
      #>  ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.0.5)
      #>  evaluate      0.14    2019-05-28 [1] CRAN (R 4.0.5)
      #>  fansi         0.5.0   2021-05-25 [1] CRAN (R 4.0.5)
      #>  fastmap       1.1.0   2021-01-25 [1] CRAN (R 4.0.5)
      #>  fs            1.5.0   2020-07-31 [1] CRAN (R 4.0.5)
      #>  generics      0.1.2   2022-01-31 [1] CRAN (R 4.0.5)
      #>  glue          1.4.2   2020-08-27 [1] CRAN (R 4.0.5)
      #>  highr         0.9     2021-04-16 [1] CRAN (R 4.0.5)
      #>  htmltools     0.5.2   2021-08-25 [1] CRAN (R 4.0.5)
      #>  knitr         1.36    2021-09-29 [1] CRAN (R 4.0.5)
      #>  lifecycle     1.0.1   2021-09-24 [1] CRAN (R 4.0.5)
      #>  magrittr      2.0.1   2020-11-17 [1] CRAN (R 4.0.5)
      #>  pillar        1.7.0   2022-02-01 [1] CRAN (R 4.0.5)
      #>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.0.5)
      #>  purrr         0.3.4   2020-04-17 [1] CRAN (R 4.0.5)
      #>  R6            2.5.1   2021-08-19 [1] CRAN (R 4.0.5)
      #>  reprex        2.0.1   2021-08-05 [1] CRAN (R 4.0.5)
      #>  rlang         1.0.2   2022-03-04 [1] CRAN (R 4.0.5)
      #>  rmarkdown     2.10    2021-08-06 [1] CRAN (R 4.0.5)
      #>  rstudioapi    0.13    2020-11-12 [1] CRAN (R 4.0.5)
      #>  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 4.0.5)
      #>  stringi       1.7.5   2021-10-04 [1] CRAN (R 4.0.5)
      #>  stringr       1.4.0   2019-02-10 [1] CRAN (R 4.0.5)
      #>  styler        1.5.1   2021-07-13 [1] CRAN (R 4.0.5)
      #>  tibble        3.1.2   2021-05-16 [1] CRAN (R 4.0.5)
      #>  tidyselect    1.1.2   2022-02-21 [1] CRAN (R 4.0.5)
      #>  tzdb          0.2.0   2021-10-27 [1] CRAN (R 4.0.5)
      #>  utf8          1.2.1   2021-03-12 [1] CRAN (R 4.0.5)
      #>  vctrs         0.3.8   2021-04-29 [1] CRAN (R 4.0.5)
      #>  withr         2.5.0   2022-03-03 [1] CRAN (R 4.0.5)
      #>  xfun          0.25    2021-08-06 [1] CRAN (R 4.0.5)
      #>  yaml          2.2.1   2020-02-01 [1] CRAN (R 4.0.5)
      #> 
      #> [1] C:/Users/adam.DESKTOP-D3KQQA1/Documents/R/win-library/4.0
      #> [2] C:/Program Files/R/R-4.0.5/library
      

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              adam.black Adam Black
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: