Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-17444

[R] Windows Only: Cannot delete file previously accesed with open_dataset

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • 8.0.0, 8.0.1, 9.0.0
    • None
    • R
    • None
    • Windows 10
      R 4.2.1
      RStudio 22.07.1
      Arrow 9.0 (fails on arrow 8 as well)

    Description

      Hello,

      I encountered this issue because it breaks my tests when I run

      rhub::check_for_cran()

      Because of this, I know it only affects Windows, all other OS checks pass.

       

      If you write files to a directory using arrow's 

      write_*

       functions, and then 

      collect(open_dataset(directory))

       

       you cannot delete a file in the directory, you get an error. This is best demonstrated in a reprex:

       

      # setup ------------------------------------------------------------------------
      local_prefix <- tempfile()
      df <- data.frame(a = 1:5, b = letters[1:5])
      
      
      # works fine -------------------------------------------------------------------
      
      fs <- LocalFileSystem$create()
      fs$CreateDir(local_prefix)
      fsdir <- fs$cd(local_prefix)
      write_parquet(df, fsdir$path("1.parquet"))
      
      #open_dataset(local_prefix) %>% collect()
      
      fsdir$DeleteFile("1.parquet")
      unlink(local_prefix, recursive = TRUE)
      
      # doesn't work -----------------------------------------------------------------
      
      fs <- LocalFileSystem$create()
      fs$CreateDir(local_prefix)
      fsdir <- fs$cd(local_prefix)
      write_parquet(df, fsdir$path("1.parquet"))
      
      open_dataset(local_prefix) %>% collect() # <-- ERROR IS CAUSED BY THIS
      
      fsdir$DeleteFile("1.parquet") # <-- HERE IS WHERE YOU GET AN ERROR
      unlink(local_prefix, recursive = TRUE)
       
       
      

       

      Here is the error I keep getting:

       

      Error: IOError: Cannot delete file 'C:/Users/riaz/AppData/Local/Temp/Rtmp8qUlcx/file233c22f923d0/1.parquet'. Detail: [Windows error 32] The process cannot access the file because it is being used by another process.
      

       

      Note that

      • I do not create an object from the `open_dataset` function. I simply call it.
      • I also call `collect` in order to pull the data. So I cannot see why the connection to the file should exist after collect is called
      • as mentioned above, all other OSes don't exhibit this behaviour.
      • my environment pane looks identical in both instances.
      • I do not need to restart R to delete the file. I can simply clear all objects from the workspace (rm(list = ls()) and then it works fine.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              riazarbi Riaz Arbi
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: