Details
-
Bug
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
8.0.0
-
None
-
None
Description
I think that dplyr queries on FileSystemDataset objects will create locks that persist unnecessarily. This issue only seems to occur on Windows. I'm using Windows 10. Calling the garbage collector after the dplyr query seems to release the lock.
library(arrow) library(dplyr) # I can delete an arrow dataset that has been opened write_dataset(iris, "iris") ds <- open_dataset("iris") file.exists("iris") #> [1] TRUE print(unlink("iris", recursive = T)) #> [1] 0 file.exists("iris") #> [1] FALSE # However if I run a dplyr query on the data before deleting it the file is locked. write_dataset(iris, "iris") ds <- open_dataset("iris") file.exists("iris") #> [1] TRUE # I think this adds a lock that is not removed ds %>% count() %>% collect() #> # A tibble: 1 x 1 #> n #> <int> #> 1 150 print(unlink("iris", recursive = T)) #> [1] 1 file.exists("iris") #> [1] TRUE print(unlink("iris", recursive = T, force = T)) #> [1] 1 file.exists("iris") #> [1] TRUE file.remove("iris/part-0.parquet") #> Warning in file.remove("iris/part-0.parquet"): cannot remove file 'iris/ #> part-0.parquet', reason 'Permission denied' #> [1] FALSE # running gc() will clean up the lock and allow the file to be deleted gc() #> used (Mb) gc trigger (Mb) max used (Mb) #> Ncells 1178845 63 2349652 125.5 1656436 88.5 #> Vcells 2093715 16 8388608 64.0 3170844 24.2 print(unlink("iris", recursive = T)) #> [1] 0 file.exists("iris") #> [1] FALSE sessioninfo::session_info() #> - Session info --------------------------------------------------------------- #> setting value #> version R version 4.0.5 (2021-03-31) #> os Windows 10 x64 #> system x86_64, mingw32 #> ui RTerm #> language (EN) #> collate English_United States.1252 #> ctype English_United States.1252 #> tz America/New_York #> date 2022-07-07 #> #> - Packages ------------------------------------------------------------------- #> package * version date lib source #> arrow * 8.0.0 2022-05-09 [1] CRAN (R 4.0.5) #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.5) #> backports 1.4.0 2021-11-23 [1] CRAN (R 4.0.5) #> bit 4.0.4 2020-08-04 [1] CRAN (R 4.0.5) #> bit64 4.0.5 2020-08-30 [1] CRAN (R 4.0.5) #> cli 3.0.1 2021-07-17 [1] CRAN (R 4.0.5) #> crayon 1.5.1 2022-03-26 [1] CRAN (R 4.0.5) #> DBI 1.1.2 2021-12-20 [1] CRAN (R 4.0.5) #> digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.5) #> dplyr * 1.0.8 2022-02-08 [1] CRAN (R 4.0.5) #> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.0.5) #> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.5) #> fansi 0.5.0 2021-05-25 [1] CRAN (R 4.0.5) #> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.0.5) #> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.5) #> generics 0.1.2 2022-01-31 [1] CRAN (R 4.0.5) #> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.5) #> highr 0.9 2021-04-16 [1] CRAN (R 4.0.5) #> htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.0.5) #> knitr 1.36 2021-09-29 [1] CRAN (R 4.0.5) #> lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.0.5) #> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.5) #> pillar 1.7.0 2022-02-01 [1] CRAN (R 4.0.5) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.5) #> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.5) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.0.5) #> reprex 2.0.1 2021-08-05 [1] CRAN (R 4.0.5) #> rlang 1.0.2 2022-03-04 [1] CRAN (R 4.0.5) #> rmarkdown 2.10 2021-08-06 [1] CRAN (R 4.0.5) #> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.0.5) #> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.5) #> stringi 1.7.5 2021-10-04 [1] CRAN (R 4.0.5) #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.5) #> styler 1.5.1 2021-07-13 [1] CRAN (R 4.0.5) #> tibble 3.1.2 2021-05-16 [1] CRAN (R 4.0.5) #> tidyselect 1.1.2 2022-02-21 [1] CRAN (R 4.0.5) #> tzdb 0.2.0 2021-10-27 [1] CRAN (R 4.0.5) #> utf8 1.2.1 2021-03-12 [1] CRAN (R 4.0.5) #> vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.0.5) #> withr 2.5.0 2022-03-03 [1] CRAN (R 4.0.5) #> xfun 0.25 2021-08-06 [1] CRAN (R 4.0.5) #> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.5) #> #> [1] C:/Users/adam.DESKTOP-D3KQQA1/Documents/R/win-library/4.0 #> [2] C:/Program Files/R/R-4.0.5/library
Attachments
Issue Links
- duplicates
-
ARROW-16421 [R] Permission error on Windows when deleting file previously accessed with open_dataset
- Open
- relates to
-
ARROW-16452 [R] After dataset scan, some RAM is left consumed until a garbage collection pass
- Open