Ivy
  1. Ivy
  2. IVY-1388

*.lck files created by "artifact-lock" lock strategy are not cleaned up if ivy quits abruptly

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.2.0, 2.3.0-RC1, master
    • Fix Version/s: 2.3.0
    • Component/s: None
    • Labels:
      None

      Description

      We have a few build processes running in parallel, all of which share the same ivy cache. In order not to run into any parallel downloading problems, we enabled artifact-lock lock strategy. An annoying problem with the artifact-lock strategy is that the *.lck files which are created as a lock are not cleaned up if the build exits abruptly. For example, while ivy is downloading a big jar, if you Ctrl-C to quit that build process. Those lock files will remain in the metadatas folder. The same happens too if ivy encounters some error and fails the build.

      Those uncleaned lock files will cause problem when you start the build again. The build process will hang while ivy waiting those "locks" to be released, which will never happen automatically. So eventually the build will fail with an ivy error "impossible to acquire lock for xxxyyyzzzz". What makes it worse is that ivy doesn't fail-fast on the default 2 miuntes lock timeout, it seems trying it a few times. In worst scenarios, we have seen the build hangs for 12 minutes and then timeout'd and failed.

      We believe this can be fixed by setting deleteOnExit() for the FileBasedLockStrategy.java class. We have implemented the fix and it works well.

      Patch is provided, could anyone quickly apply this one line change?

      1. patch1.patch
        2 kB
        Wei Chen
      2. FileBasedLockStrategy.java.patch
        0.5 kB
        Wei Chen

        Issue Links

          Activity

          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Resolved Resolved
          37d 7h 31m 1 Nicolas Lalevée 16/Dec/12 23:25
          Hide
          Charles Duffy added a comment -

          Not by any means arguing that the current behavior with the default locking strategy (which, for aforementioned backwards compatibility reasons, needs to remain default) is anything other than a bug. (As an aside – In NIO mode, it's best practice to leave lockfiles behind unconditionally, as not doing so greatly increases the surface for race conditions).

          Feel free to CC me on any such discussion you start. I'm in the middle of a release in the day job right now, but if nobody else gets to the ticket before then, am reasonably likely to have some free time to poke at Ivy again in December or so.

          Show
          Charles Duffy added a comment - Not by any means arguing that the current behavior with the default locking strategy (which, for aforementioned backwards compatibility reasons, needs to remain default) is anything other than a bug. (As an aside – In NIO mode, it's best practice to leave lockfiles behind unconditionally, as not doing so greatly increases the surface for race conditions). Feel free to CC me on any such discussion you start. I'm in the middle of a release in the day job right now, but if nobody else gets to the ticket before then, am reasonably likely to have some free time to poke at Ivy again in December or so.
          Shawn Heisey made changes -
          Link This issue relates to IVY-1489 [ IVY-1489 ]
          Hide
          Shawn Heisey added a comment -

          Since I don't actually know how to integrate ivy with ant, I don't know how to use the lock strategy info you gave me. I can start a discussion on dev@lucene.a.o.

          I believe that this behavior should be considered a bug. If something catastrophic had happened, like a kernel panic or losing power, then I could understand the lock file being left behind ... but it's happening as a result of ctrl-c. I would have expected the shutdown hook (fix for this issue) to have dealt with that problem, but apparently the fix wasn't complete enough.

          I have already filed IVY-1489 for the continuing problems I've observed.

          Show
          Shawn Heisey added a comment - Since I don't actually know how to integrate ivy with ant, I don't know how to use the lock strategy info you gave me. I can start a discussion on dev@lucene.a.o. I believe that this behavior should be considered a bug. If something catastrophic had happened, like a kernel panic or losing power, then I could understand the lock file being left behind ... but it's happening as a result of ctrl-c. I would have expected the shutdown hook (fix for this issue) to have dealt with that problem, but apparently the fix wasn't complete enough. I have already filed IVY-1489 for the continuing problems I've observed.
          Hide
          Charles Duffy added a comment - - edited

          Any locking support detection is done by the JVM – if you were on a filesystem where neither fcntl-based locking nor flock were supported, I wouldn't be surprised to see an exception result when trying to use NIO locking, but I'd have to read the library documentation to know whether that behavior is defined.

          A patch to enable NIO whenever the platform provides it strikes me as a good idea, but difficult to implement in practice given the need to retain compatibility across defaults for mixed-release environments (upgrading only a subset of nodes accessing the same backend). Presently, then, enabling NIO locking is runtime configuration; see http://ant.apache.org/ivy/history/latest-milestone/settings/lock-strategies.html.

          Show
          Charles Duffy added a comment - - edited Any locking support detection is done by the JVM – if you were on a filesystem where neither fcntl-based locking nor flock were supported, I wouldn't be surprised to see an exception result when trying to use NIO locking, but I'd have to read the library documentation to know whether that behavior is defined. A patch to enable NIO whenever the platform provides it strikes me as a good idea, but difficult to implement in practice given the need to retain compatibility across defaults for mixed-release environments (upgrading only a subset of nodes accessing the same backend). Presently, then, enabling NIO locking is runtime configuration; see http://ant.apache.org/ivy/history/latest-milestone/settings/lock-strategies.html .
          Hide
          Shawn Heisey added a comment -

          It is fairly repeatable, yes. I have a bad tendency to start a build before I've finished all my modifications ... I remember something just after I've started it, so I ctrl-c them frequently. It seems to happen at just the wrong moment at least a third of the time.

          The ivy integration in the lucene/solr build was not created by me, but I'm a committer on that project. I'm somewhat leery of making a switch to NIO for the default, because we have no idea what kind of filesystem our users will be running on. Locking should work well on a native filesystem, but our users might be using a gluster filesystem re-shared via samba ... and locking doesn't work on a filesystem like that.

          How do I switch to NIO in an ant build? Is there any way for me to easily detect that my home directory is on a filesystem where fcntl locking won't work? Can ivy itself make that determination and choose the locking type when it runs?

          Show
          Shawn Heisey added a comment - It is fairly repeatable, yes. I have a bad tendency to start a build before I've finished all my modifications ... I remember something just after I've started it, so I ctrl-c them frequently. It seems to happen at just the wrong moment at least a third of the time. The ivy integration in the lucene/solr build was not created by me, but I'm a committer on that project. I'm somewhat leery of making a switch to NIO for the default, because we have no idea what kind of filesystem our users will be running on. Locking should work well on a native filesystem, but our users might be using a gluster filesystem re-shared via samba ... and locking doesn't work on a filesystem like that. How do I switch to NIO in an ant build? Is there any way for me to easily detect that my home directory is on a filesystem where fcntl locking won't work? Can ivy itself make that determination and choose the locking type when it runs?
          Hide
          Charles Duffy added a comment -

          Shawn – if this is a repeated issue, I'd strongly suggest using the NIO backend, which uses flock on fcntl-based locking (or local equivalent on win32) to provide locks which, while the files may still exist on the filesystem, will not be honored after any kind of abrupt exit, up to and including unanticipated power loss.

          Show
          Charles Duffy added a comment - Shawn – if this is a repeated issue, I'd strongly suggest using the NIO backend, which uses flock on fcntl-based locking (or local equivalent on win32) to provide locks which, while the files may still exist on the filesystem, will not be honored after any kind of abrupt exit, up to and including unanticipated power loss.
          Hide
          Anurag Sharma added a comment - - edited

          I am also facing the same issue. Here is the detailed log when running in verbose mode(ant compile -verbose)

          Overriding previous definition of property "ivy.version"
          [ivy:retrieve] no resolved descriptor found: launching default resolve
          Overriding previous definition of property "ivy.version"
          [ivy:retrieve] using ivy parser to parse file:/C:/work/trunk/solr/example/ivy.xml
          [ivy:retrieve] :: resolving dependencies :: org.apache.solr#example;working@dev-pc
          [ivy:retrieve]  confs: [logging]
          [ivy:retrieve]  validate = true
          [ivy:retrieve]  refresh = false
          [ivy:retrieve] resolving dependencies for configuration 'logging'
          [ivy:retrieve] == resolving dependencies for org.apache.solr#example;working@dev-pc [logging]
          [ivy:retrieve] == resolving dependencies org.apache.solr#example;working@dev-pc->log4j#log4j;1.2.17 [logging->master]
          [ivy:retrieve] default: Checking cache for: dependency: log4j#log4j;1.2.17 {logging=[master]}
          [ivy:retrieve] don't use cache for log4j#log4j;1.2.17: checkModified=true
          [ivy:retrieve]          tried C:\Users\user1.dev-pc\.ivy2\local\log4j\log4j\1.2.17\ivys\ivy.xml
          [ivy:retrieve]          tried C:\Users\user1.dev-pc\.ivy2\local\log4j\log4j\1.2.17\jars\log4j.jar
          [ivy:retrieve]  local: no ivy file nor artifact found for log4j#log4j;1.2.17
          [ivy:retrieve] main: Checking cache for: dependency: log4j#log4j;1.2.17 {logging=[master]}
          [ivy:retrieve] main: module revision found in cache: log4j#log4j;1.2.17
          [ivy:retrieve]  found log4j#log4j;1.2.17 in public
          [ivy:retrieve] == resolving dependencies org.apache.solr#example;working@dev-pc->org.slf4j#slf4j-api;1.7.6 [logging->master]
          [ivy:retrieve] default: Checking cache for: dependency: org.slf4j#slf4j-api;1.7.6 {logging=[master]}
          [ivy:retrieve] don't use cache for org.slf4j#slf4j-api;1.7.6: checkModified=true
          [ivy:retrieve] ERROR: impossible to acquire lock for org.slf4j#slf4j-api;1.7.6
          [ivy:retrieve]          tried C:\Users\user1.dev-pc\.ivy2\local\org.slf4j\slf4j-api\1.7.6\ivys\ivy.xml
          [ivy:retrieve]          tried C:\Users\user1.dev-pc\.ivy2\local\org.slf4j\slf4j-api\1.7.6\jars\slf4j-api.jar
          [ivy:retrieve]  local: no ivy file nor artifact found for org.slf4j#slf4j-api;1.7.6
          [ivy:retrieve] main: Checking cache for: dependency: org.slf4j#slf4j-api;1.7.6 {logging=[master]}
          [ivy:retrieve] ERROR: impossible to acquire lock for org.slf4j#slf4j-api;1.7.6
          [ivy:retrieve] ERROR: impossible to acquire lock for org.slf4j#slf4j-api;1.7.6
          [ivy:retrieve]          tried C:\Users\user1.dev-pc\.ivy2\shared\org.slf4j\slf4j-api\1.7.6\ivys\ivy.xml
          [ivy:retrieve]          tried C:\Users\user1.dev-pc\.ivy2\shared\org.slf4j\slf4j-api\1.7.6\jars\slf4j-api.jar
          [ivy:retrieve]  shared: no ivy file nor artifact found for org.slf4j#slf4j-api;1.7.6
          

          "ant clean" also doesn't make a difference but deleting the directory ".ivy\cache\org.slf4j\slf4j-api" resolves compilation.

          Show
          Anurag Sharma added a comment - - edited I am also facing the same issue. Here is the detailed log when running in verbose mode(ant compile -verbose) Overriding previous definition of property "ivy.version" [ivy:retrieve] no resolved descriptor found: launching default resolve Overriding previous definition of property "ivy.version" [ivy:retrieve] using ivy parser to parse file:/C:/work/trunk/solr/example/ivy.xml [ivy:retrieve] :: resolving dependencies :: org.apache.solr#example;working@dev-pc [ivy:retrieve] confs: [logging] [ivy:retrieve] validate = true [ivy:retrieve] refresh = false [ivy:retrieve] resolving dependencies for configuration 'logging' [ivy:retrieve] == resolving dependencies for org.apache.solr#example;working@dev-pc [logging] [ivy:retrieve] == resolving dependencies org.apache.solr#example;working@dev-pc->log4j#log4j;1.2.17 [logging->master] [ivy:retrieve] default : Checking cache for : dependency: log4j#log4j;1.2.17 {logging=[master]} [ivy:retrieve] don't use cache for log4j#log4j;1.2.17: checkModified= true [ivy:retrieve] tried C:\Users\user1.dev-pc\.ivy2\local\log4j\log4j\1.2.17\ivys\ivy.xml [ivy:retrieve] tried C:\Users\user1.dev-pc\.ivy2\local\log4j\log4j\1.2.17\jars\log4j.jar [ivy:retrieve] local: no ivy file nor artifact found for log4j#log4j;1.2.17 [ivy:retrieve] main: Checking cache for : dependency: log4j#log4j;1.2.17 {logging=[master]} [ivy:retrieve] main: module revision found in cache: log4j#log4j;1.2.17 [ivy:retrieve] found log4j#log4j;1.2.17 in public [ivy:retrieve] == resolving dependencies org.apache.solr#example;working@dev-pc->org.slf4j#slf4j-api;1.7.6 [logging->master] [ivy:retrieve] default : Checking cache for : dependency: org.slf4j#slf4j-api;1.7.6 {logging=[master]} [ivy:retrieve] don't use cache for org.slf4j#slf4j-api;1.7.6: checkModified= true [ivy:retrieve] ERROR: impossible to acquire lock for org.slf4j#slf4j-api;1.7.6 [ivy:retrieve] tried C:\Users\user1.dev-pc\.ivy2\local\org.slf4j\slf4j-api\1.7.6\ivys\ivy.xml [ivy:retrieve] tried C:\Users\user1.dev-pc\.ivy2\local\org.slf4j\slf4j-api\1.7.6\jars\slf4j-api.jar [ivy:retrieve] local: no ivy file nor artifact found for org.slf4j#slf4j-api;1.7.6 [ivy:retrieve] main: Checking cache for : dependency: org.slf4j#slf4j-api;1.7.6 {logging=[master]} [ivy:retrieve] ERROR: impossible to acquire lock for org.slf4j#slf4j-api;1.7.6 [ivy:retrieve] ERROR: impossible to acquire lock for org.slf4j#slf4j-api;1.7.6 [ivy:retrieve] tried C:\Users\user1.dev-pc\.ivy2\shared\org.slf4j\slf4j-api\1.7.6\ivys\ivy.xml [ivy:retrieve] tried C:\Users\user1.dev-pc\.ivy2\shared\org.slf4j\slf4j-api\1.7.6\jars\slf4j-api.jar [ivy:retrieve] shared: no ivy file nor artifact found for org.slf4j#slf4j-api;1.7.6 "ant clean" also doesn't make a difference but deleting the directory ".ivy\cache\org.slf4j\slf4j-api" resolves compilation.
          Hide
          Shawn Heisey added a comment -

          I still sometimes get lck files left over when I interrupt the lucene/solr build process with Ctrl-C, even though I have upgraded to ivy 2.3.0, so it gets stuck forever at the resolve step. It does happen much less often than it did before the ivy upgrade, though. Should I file a new issue?

          I see this when it hangs:

          ivy-configure:
          [ivy:configure] :: Apache Ivy 2.3.0 - 20130110142753 :: http://ant.apache.org/ivy/ ::
          [ivy:configure] :: loading settings :: file = /index/src/branch_4x/lucene/ivy-settings.xml
          
          resolve:
          
          Show
          Shawn Heisey added a comment - I still sometimes get lck files left over when I interrupt the lucene/solr build process with Ctrl-C, even though I have upgraded to ivy 2.3.0, so it gets stuck forever at the resolve step. It does happen much less often than it did before the ivy upgrade, though. Should I file a new issue? I see this when it hangs: ivy-configure: [ivy:configure] :: Apache Ivy 2.3.0 - 20130110142753 :: http://ant.apache.org/ivy/ :: [ivy:configure] :: loading settings :: file = /index/src/branch_4x/lucene/ivy-settings.xml resolve:
          Gavin made changes -
          Link This issue is depended upon by LUCENE-4636 [ LUCENE-4636 ]
          Gavin made changes -
          Link This issue blocks LUCENE-4636 [ LUCENE-4636 ]
          Maarten Coene made changes -
          Fix Version/s trunk [ 12320744 ]
          Shawn Heisey made changes -
          Link This issue blocks LUCENE-4636 [ LUCENE-4636 ]
          Nicolas Lalevée made changes -
          Fix Version/s 2.3.0 [ 12323508 ]
          Hide
          Nicolas Lalevée added a comment -

          It's marked as a bug. Hence I have merged the patch into the 2.3.x branch.

          Show
          Nicolas Lalevée added a comment - It's marked as a bug. Hence I have merged the patch into the 2.3.x branch.
          Hide
          Shawn Heisey added a comment -

          Will this be in a released version soon? I am guessing that would mean backporting to 2.3.

          I contribute to the Lucene (Solr) project, which uses ivy. Hangs caused by .lck files happen quite often. I finally used strace to track down the culprit, which led me here. The "fix" currently shared by Lucene developers is to wipe the ~/.ivy2 directory and try again. That makes things take longer because it has to re-download a lot of jars.

          It would be awesome if the Lucene project could update their ivy-bootstrap target to include a version with this fix.

          Show
          Shawn Heisey added a comment - Will this be in a released version soon? I am guessing that would mean backporting to 2.3. I contribute to the Lucene (Solr) project, which uses ivy. Hangs caused by .lck files happen quite often. I finally used strace to track down the culprit, which led me here. The "fix" currently shared by Lucene developers is to wipe the ~/.ivy2 directory and try again. That makes things take longer because it has to re-download a lot of jars. It would be awesome if the Lucene project could update their ivy-bootstrap target to include a version with this fix.
          Nicolas Lalevée made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Assignee Nicolas Lalevée [ hibou ]
          Fix Version/s trunk [ 12320744 ]
          Resolution Fixed [ 1 ]
          Hide
          Nicolas Lalevée added a comment -

          patch applied. Thanks !

          Show
          Nicolas Lalevée added a comment - patch applied. Thanks !
          Hide
          Wei Chen added a comment -

          Good point, Maarten. Just checked the implementation of the DeleteOnExitHook, it deletes the file with the given file path. So you are right.

          The java provided DeleteOnExitHook is not visible and also doesn't take remove(). So I implemented a simple one, see pathc1. Hope this can address your concern.

          Show
          Wei Chen added a comment - Good point, Maarten. Just checked the implementation of the DeleteOnExitHook, it deletes the file with the given file path. So you are right. The java provided DeleteOnExitHook is not visible and also doesn't take remove(). So I implemented a simple one, see pathc1. Hope this can address your concern.
          Wei Chen made changes -
          Attachment patch1.patch [ 12552964 ]
          Hide
          Wei Chen added a comment -

          a new patch.

          Show
          Wei Chen added a comment - a new patch.
          Hide
          Maarten Coene added a comment -

          I think this could break the locking in another way.
          When process A finishes normally it could delete the lock file of process B:

          process A: take lock (with deleteOnExit)
          process A: release lock
          process B: take lock
          process A terminates: the lock will get deleted (but the ShutdownHook deletes now the lock of process B!!)
          process C: take lock -> oops: both processB and processB are in!

          Maarten

          Show
          Maarten Coene added a comment - I think this could break the locking in another way. When process A finishes normally it could delete the lock file of process B: process A: take lock (with deleteOnExit) process A: release lock process B: take lock process A terminates: the lock will get deleted (but the ShutdownHook deletes now the lock of process B!!) process C: take lock -> oops: both processB and processB are in! Maarten
          Wei Chen made changes -
          Field Original Value New Value
          Attachment FileBasedLockStrategy.java.patch [ 12552847 ]
          Hide
          Wei Chen added a comment -

          One line change only: enabled file.deleteOnExit()

          Show
          Wei Chen added a comment - One line change only: enabled file.deleteOnExit()
          Wei Chen created issue -

            People

            • Assignee:
              Nicolas Lalevée
              Reporter:
              Wei Chen
            • Votes:
              3 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - 0.5m
                0.5m
                Remaining:
                Remaining Estimate - 0.5m
                0.5m
                Logged:
                Time Spent - Not Specified
                Not Specified

                  Development