Solr
  1. Solr
  2. SOLR-240

java.io.IOException: Lock obtain timed out: SimpleFSLock

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.2
    • Fix Version/s: 1.3
    • Component/s: update
    • Labels:
      None
    • Environment:

      windows xp

      Description

      when running the soon to be attached sample application against solr it will eventually die. this same error has happened on both windows and rh4 linux. the app is just submitting docs with an id in batches of 10, performing a commit then repeating over and over again.

      1. ASF.LICENSE.NOT.GRANTED--IndexWriter.patch
        1 kB
        Will Johnson
      2. IndexWriter2.patch
        5 kB
        Hoss Man
      3. IndexWriter2.patch
        5 kB
        Hoss Man
      4. IndexWriter2.patch
        4 kB
        Will Johnson
      5. stacktrace.txt
        2 kB
        Will Johnson
      6. ThrashIndex.java
        2 kB
        Will Johnson

        Issue Links

          Activity

          Hide
          Yonik Seeley added a comment -

          Thanks Will, I'll try and reproduce this.

          Show
          Yonik Seeley added a comment - Thanks Will, I'll try and reproduce this.
          Hide
          Will Johnson added a comment -

          I found this:

          http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/or
          g/apache/lucene/store/NativeFSLockFactory.html

          And so I made the attached patch which seems to run at least 100x longer
          than without.

          • will
          Show
          Will Johnson added a comment - I found this: http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/or g/apache/lucene/store/NativeFSLockFactory.html And so I made the attached patch which seems to run at least 100x longer than without. will
          Hide
          Hoss Man added a comment -

          the idea of using different lock implementations has come up in the past,

          http://www.nabble.com/switch-to-native-locks-by-default--tf2967095.html

          one reason not to hardcode native locks was because not all file systems support it – so we left in the usage of SimpleFSLock because it's the most generally reusable.

          rather then change from one hardcoded lock type to another hardcoded lock type, we should support a config option for making the choice ... perhaps even adding a SolrLockFactory that defines an init(NamedList) method and creating simple SOlr sucbclasses of all the core Lucene LockFactor Imples so it's easy for people to write their own if they want (and we don't just have "if (lockType.equlas("simple"))..." type config parsing.

          Show
          Hoss Man added a comment - the idea of using different lock implementations has come up in the past, http://www.nabble.com/switch-to-native-locks-by-default--tf2967095.html one reason not to hardcode native locks was because not all file systems support it – so we left in the usage of SimpleFSLock because it's the most generally reusable. rather then change from one hardcoded lock type to another hardcoded lock type, we should support a config option for making the choice ... perhaps even adding a SolrLockFactory that defines an init(NamedList) method and creating simple SOlr sucbclasses of all the core Lucene LockFactor Imples so it's easy for people to write their own if they want (and we don't just have "if (lockType.equlas("simple"))..." type config parsing.
          Hide
          Will Johnson added a comment -

          the attached patch adds a param to SolrIndexConfig called useNativeLocks. the default is false which will keeps with the existing design using SimpleFSLockFactory. if people think we should allow fully pluggable locking mechanisms i'm game but i wasn't quite sure how to tackle that problem.

          as for testing, i wasn't quite sure how to run tests to ensure that the locks were working beyond some basic println's (which passed). if anyone has good ideas i'm all ears.

          • will
          Show
          Will Johnson added a comment - the attached patch adds a param to SolrIndexConfig called useNativeLocks. the default is false which will keeps with the existing design using SimpleFSLockFactory. if people think we should allow fully pluggable locking mechanisms i'm game but i wasn't quite sure how to tackle that problem. as for testing, i wasn't quite sure how to run tests to ensure that the locks were working beyond some basic println's (which passed). if anyone has good ideas i'm all ears. will
          Hide
          Yonik Seeley added a comment -

          I'm running ThrashIndex against two solr/resin instances on a RHEL4 box, one using the servlet, another using the new dispatch filter. I haven't seen any exceptions for either yet...

          Show
          Yonik Seeley added a comment - I'm running ThrashIndex against two solr/resin instances on a RHEL4 box, one using the servlet, another using the new dispatch filter. I haven't seen any exceptions for either yet...
          Hide
          Will Johnson added a comment -

          i get the stacktrace below with the latest from head with useNativeLocks turned off (from my patch). this took about 2 minutes to reproduce on my windows laptop.

          one thing i thought of is that local antivirus scanning / backup software which we run here may be getting in the way. i know many other search engines / high performance databases out there have issues with antivirus software because it causes similar locking issues. i'm disabling as much of the IT 'malware' as possible and seeing better results even with default locking however i had everything running when i had good results with the native locking enabled so it still seems to be a good idea to use the patch (or something similar).

          • will

          SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: SimpleFSLock@solr\data\index\lucene-e7
          b822c61c394dd5f449aaf5e5717356-write.lock
          at org.apache.lucene.store.Lock.obtain(Lock.java:70)
          at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:579)
          at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:391)
          at org.apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java:81)
          at org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:120)
          at org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:181)
          at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:259)
          at org.apache.solr.handler.XmlUpdateRequestHandler.update(XmlUpdateRequestHandler.java:166)
          at org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:84)
          at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:79)
          at org.apache.solr.core.SolrCore.execute(SolrCore.java:658)
          at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:198)
          at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:166)
          at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
          at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
          at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
          at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
          at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
          at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
          at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
          at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
          at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
          at org.mortbay.jetty.Server.handle(Server.java:285)
          at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
          at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
          at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
          at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
          at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
          at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:368)
          at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)

          Show
          Will Johnson added a comment - i get the stacktrace below with the latest from head with useNativeLocks turned off (from my patch). this took about 2 minutes to reproduce on my windows laptop. one thing i thought of is that local antivirus scanning / backup software which we run here may be getting in the way. i know many other search engines / high performance databases out there have issues with antivirus software because it causes similar locking issues. i'm disabling as much of the IT 'malware' as possible and seeing better results even with default locking however i had everything running when i had good results with the native locking enabled so it still seems to be a good idea to use the patch (or something similar). will SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: SimpleFSLock@solr\data\index\lucene-e7 b822c61c394dd5f449aaf5e5717356-write.lock at org.apache.lucene.store.Lock.obtain(Lock.java:70) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:579) at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:391) at org.apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java:81) at org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:120) at org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:181) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:259) at org.apache.solr.handler.XmlUpdateRequestHandler.update(XmlUpdateRequestHandler.java:166) at org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:84) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:79) at org.apache.solr.core.SolrCore.execute(SolrCore.java:658) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:198) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:166) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:368) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
          Hide
          Luis Neves added a comment -

          Hi,
          I'm also experiencing this problem. Every 4 hours or so the server starts spewing this stacktrace when updating the index.

          My environment is:

          Resin 3.0.23 / Solr 1.2 / JDK6u1
          $ cat /proc/version
          Linux version 2.6.16-1-686-smp (Debian 2.6.16-10) (waldi@debian.org) (gcc version 4.0.4 20060422 (prerelease) (Debian 4.0.3-2)) #2 SMP Tue Apr 25 20:45:37 UTC 2006

          The IndexWriter2.patch seems to fix the problem, the server has now been running for 48h without problems.

          Show
          Luis Neves added a comment - Hi, I'm also experiencing this problem. Every 4 hours or so the server starts spewing this stacktrace when updating the index. My environment is: Resin 3.0.23 / Solr 1.2 / JDK6u1 $ cat /proc/version Linux version 2.6.16-1-686-smp (Debian 2.6.16-10) (waldi@debian.org) (gcc version 4.0.4 20060422 (prerelease) (Debian 4.0.3-2)) #2 SMP Tue Apr 25 20:45:37 UTC 2006 The IndexWriter2.patch seems to fix the problem, the server has now been running for 48h without problems.
          Hide
          Yonik Seeley added a comment -

          > And so I made the attached patch which seems to run at least 100x longer than without.

          Does this mean you still had occasional issues with native locking?

          Does anyone ever see exceptions relating to removal of the lockfile (presumably that's why it can't be aquired by the new IndexWriter instance?)

          It's worrying that it's also reproducable on Linux... (although the oldest Solr collections have been running in CNET for 2 years now, and I've never seen this happen). What I have seen is that exact exception when the server died, restarted, and then couldn't grab the write lock.... normally due to not a big enough heap causing excessive GC and leading resin's wrapper to restart the container.

          Show
          Yonik Seeley added a comment - > And so I made the attached patch which seems to run at least 100x longer than without. Does this mean you still had occasional issues with native locking? Does anyone ever see exceptions relating to removal of the lockfile (presumably that's why it can't be aquired by the new IndexWriter instance?) It's worrying that it's also reproducable on Linux... (although the oldest Solr collections have been running in CNET for 2 years now, and I've never seen this happen). What I have seen is that exact exception when the server died, restarted, and then couldn't grab the write lock.... normally due to not a big enough heap causing excessive GC and leading resin's wrapper to restart the container.
          Hide
          Will Johnson added a comment -

          longer >>than without.

          No, after I applied the patch I have never seen a lockup.

          oldest Solr collections have been running in CNET for 2 years now, and
          I've never seen this happen). What I have seen is that exact
          exception when the server died, restarted, and then couldn't grab the
          write lock.... normally due to not a big enough heap causing excessive
          GC and leading resin's wrapper to restart the container.

          Another reason to use native locking. From the lucene native fs lock
          javadocs: "Furthermore, if the JVM crashes, the OS will free any held
          locks, whereas SimpleFSLockFactory will keep the locks held, requiring
          manual removal before re-running Lucene."

          My hunch (and that's all it is) is that people seeing/not seeing the
          issue may come down to usage patterns. My project is heavily focused on
          low indexing latency so we're doing huge numbers of
          add/deletes/commits/searches in very fast succession and from multiple
          clients. A more batch oriented update usage pattern may not see the
          issue.

          The patch because as is, it doesn't change any api or cause any change
          of existing functionality whatsoever unless you use the new option in
          solrconfig. I would argue that using native locking should be the
          default though.

          • will
          Show
          Will Johnson added a comment - longer >>than without. No, after I applied the patch I have never seen a lockup. oldest Solr collections have been running in CNET for 2 years now, and I've never seen this happen). What I have seen is that exact exception when the server died, restarted, and then couldn't grab the write lock.... normally due to not a big enough heap causing excessive GC and leading resin's wrapper to restart the container. Another reason to use native locking. From the lucene native fs lock javadocs: "Furthermore, if the JVM crashes, the OS will free any held locks, whereas SimpleFSLockFactory will keep the locks held, requiring manual removal before re-running Lucene." My hunch (and that's all it is) is that people seeing/not seeing the issue may come down to usage patterns. My project is heavily focused on low indexing latency so we're doing huge numbers of add/deletes/commits/searches in very fast succession and from multiple clients. A more batch oriented update usage pattern may not see the issue. The patch because as is, it doesn't change any api or cause any change of existing functionality whatsoever unless you use the new option in solrconfig. I would argue that using native locking should be the default though. will
          Hide
          Hoss Man added a comment -

          This is a variation on Will's IndexWriter2.patch that replaces the "useNativeLocks" boolean config option with a string config option allowing people to pick any of the 4 built in Lucene lock factories.

          (i'd been meaning to try and write a "LockFactoryFactory" to allow people to specify any arbitrary LockFactory impl as a plugin, but it seemed like overkill – having Will's useNativeLocks option didn't preclude adding something like that later, but recent comments reminded me that for the majority of SOlr users, the "NoLockFactory" would actually be perfectly fine since Solr only ever opens one IndexWriter at a time)

          so this patch provides a little bit more flexibility then the previous one, without going whole-hog to a FactoryFactory/plugin model.

          It should be noted that I left the hardcoded default in the code in to be SimpleFSLockFactory but i set the "example" default to be NoLockFactory with a comment that that should be find for any Solr user not modifying the index externally to Solr.

          comments?

          Show
          Hoss Man added a comment - This is a variation on Will's IndexWriter2.patch that replaces the "useNativeLocks" boolean config option with a string config option allowing people to pick any of the 4 built in Lucene lock factories. (i'd been meaning to try and write a "LockFactoryFactory" to allow people to specify any arbitrary LockFactory impl as a plugin, but it seemed like overkill – having Will's useNativeLocks option didn't preclude adding something like that later, but recent comments reminded me that for the majority of SOlr users, the "NoLockFactory" would actually be perfectly fine since Solr only ever opens one IndexWriter at a time) so this patch provides a little bit more flexibility then the previous one, without going whole-hog to a FactoryFactory/plugin model. It should be noted that I left the hardcoded default in the code in to be SimpleFSLockFactory but i set the "example" default to be NoLockFactory with a comment that that should be find for any Solr user not modifying the index externally to Solr. comments?
          Hide
          Yonik Seeley added a comment -

          > i set the "example" default to be NoLockFactory

          How about SingleInstanceLockFactory to aid in catching concurrency bugs?

          Show
          Yonik Seeley added a comment - > i set the "example" default to be NoLockFactory How about SingleInstanceLockFactory to aid in catching concurrency bugs?
          Hide
          Yonik Seeley added a comment -

          > SingleInstanceLockFactory
          or even better, a subclass or other implementation: SingleInstanceWarnLockFactory or SingleInstanceCoordinatedLockFactory that log a failure if obtain() is called for a lock that is already locked.

          Show
          Yonik Seeley added a comment - > SingleInstanceLockFactory or even better, a subclass or other implementation: SingleInstanceWarnLockFactory or SingleInstanceCoordinatedLockFactory that log a failure if obtain() is called for a lock that is already locked.
          Hide
          Hoss Man added a comment -

          good point about recommending 'single' in the event of concurrency bugs.

          i've never really looked at the internals of the LockFactories so i'm going to punt on the subclass idea for now (i like it i just don't have time to do it) but we can always redefine "single" later. (i'll open another bug if we're okay with committing this new patch as is)

          revised patch just changes the wording and suggested value in solrconfig.xml

          objections?

          Show
          Hoss Man added a comment - good point about recommending 'single' in the event of concurrency bugs. i've never really looked at the internals of the LockFactories so i'm going to punt on the subclass idea for now (i like it i just don't have time to do it) but we can always redefine "single" later. (i'll open another bug if we're okay with committing this new patch as is) revised patch just changes the wording and suggested value in solrconfig.xml objections?
          Hide
          Yonik Seeley added a comment -

          No objections... a hang (in the event of bugs) will suffice for now.

          Show
          Yonik Seeley added a comment - No objections... a hang (in the event of bugs) will suffice for now.
          Hide
          Hoss Man added a comment -

          Committed revision 556099.

          Show
          Hoss Man added a comment - Committed revision 556099.

            People

            • Assignee:
              Hoss Man
              Reporter:
              Will Johnson
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development