Uploaded image for project: 'Geronimo'
  1. Geronimo
  2. GERONIMO-3489

Deployment problems caused by file deletion failures

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.0.1
    • 2.0.2, 2.1
    • deployment
    • Security Level: public (Regular issues)
    • None

    Description

      File.delete() failures in IOUtil.recursiveDelete() are causing various deployment problems. I open this JIRA to discuss them to see how the server might better handle them. In all but one case, delete failures are not even noted with a log record! Deletion problems are seen in many environments and platforms, but they are persistently fatal when using a NFS file system for the repository.

      In investigating the problem, I have added code to recursiveDelete to retry the delete a few times if it fails. I added code to list directory contents if a directory delete failed, and saw a file named .nfs000000002bc43500000053e in the directory. My first attempt at a bypass was to retry a failed delete 5 times, sleeping a second before each try. This did not work. I added a call to System.gc() before each sleep, and this got me passed the problem. Interestingly, two retries were required to get this to work. In another version, each retry was a second longer, and I printed all file names in a directory before trying the delete. This worked in most cases, but required the full 5 retries, so I suspect System.gc() would have time. System.runFinalization() would be something else to try.

      RepositoryConfigurationStore.createNewConfigurationDir(Artifact) shows the failing end of the deletion problem, with the dreaded ConfigurationAlreadyExistsException("Configuration already exists: " + configId)exception. I think this message is not good. It should really say directory already exists. If the file is not deleted on undeploy, this failure occurs on a subsequent deploy. What is really bad is if the user invokes a redeploy operation, and the file delete fails on the undeploy. It is important that undeploy not complete until the file goes away.

      From other environments, I am not convinced that all file handles and references, and particularly open streams, are being closed on some artifacts. This will cause the delete to fail. It may be that the gc() calls are cleaning these up, and allowing the deletes to work in my case above.

      Another option is that RepositoryConfigurationStore.createNewConfigurationDir(Artifact) not throw a ConfigurationAlreadyExistsException if the only problem is an empty directory structure exists. The next line creates the directory structure anyway.

      Attachments

        1. G3489-3.patch
          1 kB
          Ted Kirby
        2. G3489-2.patch
          3 kB
          Ted Kirby
        3. G3489-1.patch
          3 kB
          Ted Kirby

        Activity

          People

            drwoods Donald Woods
            tkirby Ted Kirby
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: