Solr
  1. Solr
  2. SOLR-3699

SolrIndexWriter constructor leaks Directory if Exception creating IndexWriterConfig

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0
    • Component/s: None
    • Labels:
      None

      Description

      in LUCENE-4278 i had to add a hack to force SimpleFSDir for CoreContainerCoreInitFailuresTest, because it doesnt close its Directory on certain errors.

      This might indicate a problem that leaks happen if certain errors happen (e.g. not handled in finally)

      1. SOLR-3699.patch
        11 kB
        Hoss Man
      2. SOLR-3699.patch
        10 kB
        Hoss Man
      3. SOLR-3699.patch
        8 kB
        Hoss Man
      4. SOLR-3699.patch
        8 kB
        Hoss Man
      5. SOLR-3699.patch
        3 kB
        Hoss Man

        Issue Links

          Activity

          Hide
          Hoss Man added a comment -

          Tracked the problem down to SolrIndexWriter ... attached patch demonstrates it in the simplest usecase: a SolrCore that constructs a SolrIndexWriter where the Directory is created fine, but then the IndexWriterConfig has a problem.

          Unfortunately there's no clear and easy route to a fix because of how this is all done inline in a call to super(...) ... as noted in the test comments...

            public void testBogusMergePolicy() throws Exception {
              // Directory is leaked because SolrIndexWriter constructor has inline 
              // calls to both DirectoryFactory (which succeeds) and 
              // Config.toIndexWriterConfig (which fails) -- but there is nothing to 
              // decref the DerectoryFactory when Config throws an Exception
              // 
              // Not good to require the caller of "new SolrIndexWriter(...)" to decref 
              // the DirectoryFactory on exception, because they would have to be sure 
              // the exception didn't already come from the DirectoryFactory in the first place.
              // I think we need to re-work the inline calls in SolrIndexWriter construct
          

          (Ironically: this "bad-mp-config.xml" i was using in CoreContainerCoreInitFailuresTest has existed for a while, but wasn't already being used in the "TestBadConfig" class that tries to create SolrCores with bad cofigs – if it had we would have caught this a long time ago. It was only being used in SolrIndexConfigTest where it was micro testing the SolrIndexConfig and the DirectoryFactory wasn't used)

          Show
          Hoss Man added a comment - Tracked the problem down to SolrIndexWriter ... attached patch demonstrates it in the simplest usecase: a SolrCore that constructs a SolrIndexWriter where the Directory is created fine, but then the IndexWriterConfig has a problem. Unfortunately there's no clear and easy route to a fix because of how this is all done inline in a call to super(...) ... as noted in the test comments... public void testBogusMergePolicy() throws Exception { // Directory is leaked because SolrIndexWriter constructor has inline // calls to both DirectoryFactory (which succeeds) and // Config.toIndexWriterConfig (which fails) -- but there is nothing to // decref the DerectoryFactory when Config throws an Exception // // Not good to require the caller of " new SolrIndexWriter(...)" to decref // the DirectoryFactory on exception, because they would have to be sure // the exception didn't already come from the DirectoryFactory in the first place. // I think we need to re-work the inline calls in SolrIndexWriter construct (Ironically: this "bad-mp-config.xml" i was using in CoreContainerCoreInitFailuresTest has existed for a while, but wasn't already being used in the "TestBadConfig" class that tries to create SolrCores with bad cofigs – if it had we would have caught this a long time ago. It was only being used in SolrIndexConfigTest where it was micro testing the SolrIndexConfig and the DirectoryFactory wasn't used)
          Hide
          Hoss Man added a comment -

          My quick and dirty attempt to fix this by making SolrIndexWriter's constructor private and adding a static "create" method that deals with calling directoryFactory.release() if the private constructor fails.

          Unfortunately it's still not working ... not clear to me why, but i'm about to get on a plain and won't have a chance to dig into it anymore for another 3-4 days, so i wanted to get what i have into Jira in case anyone else wants to take a stab at it.

          Show
          Hoss Man added a comment - My quick and dirty attempt to fix this by making SolrIndexWriter's constructor private and adding a static "create" method that deals with calling directoryFactory.release() if the private constructor fails. Unfortunately it's still not working ... not clear to me why, but i'm about to get on a plain and won't have a chance to dig into it anymore for another 3-4 days, so i wanted to get what i have into Jira in case anyone else wants to take a stab at it.
          Hide
          Robert Muir added a comment -

          rmuir20120906-bulk-40-change

          Show
          Robert Muir added a comment - rmuir20120906-bulk-40-change
          Hide
          Hoss Man added a comment -

          Figured out the problem in my last patch: i was ignorant of the full DirectoryFactory API and didn't realize i should be calling doneWithDirectory().

          I think this new patch is good to go, but i don't want to commit w/o review from someone who understands the DirectoryFactory semantics better (already opened SOLR-3717 because something looks wonky about the API, don't want to mess up and just fix a symptom here instead of the real problem

          Show
          Hoss Man added a comment - Figured out the problem in my last patch: i was ignorant of the full DirectoryFactory API and didn't realize i should be calling doneWithDirectory(). I think this new patch is good to go, but i don't want to commit w/o review from someone who understands the DirectoryFactory semantics better (already opened SOLR-3717 because something looks wonky about the API, don't want to mess up and just fix a symptom here instead of the real problem
          Hide
          Hoss Man added a comment -

          Mark: can you sanity check this patch for me?

          Show
          Hoss Man added a comment - Mark: can you sanity check this patch for me?
          Hide
          Hoss Man added a comment -

          updated patch to trunk.

          Doing this helped uncover another (distinct) case where directories were not being closed that was exposed by the test SOLR-3746 added: if the updateHandler fails to init, then nothing in SolrCore was closing the directoryfactory.

          I'd asked miller to review this earlier, but i'm going to assume CTR soon unless i hear objections.

          Show
          Hoss Man added a comment - updated patch to trunk. Doing this helped uncover another (distinct) case where directories were not being closed that was exposed by the test SOLR-3746 added: if the updateHandler fails to init, then nothing in SolrCore was closing the directoryfactory. I'd asked miller to review this earlier, but i'm going to assume CTR soon unless i hear objections.
          Hide
          Mark Miller added a comment -

          hmm...I have to look closer than just at the patch, but the setDirFactory method seems a little troubling - why do we do that instead of making it part of the constructor? What if you don't set it?

          +        if (null != directoryFactory) {
          +          // :HACK: normally we rely on updateHandler to do this, 
          +          // but what if updateHandler failed to init?
          +          directoryFactory.close();
          +        }
          

          I don't think you want that? The dir factory should and will be closed by the DefaultSolrCoreState when it's ref count hits 0 - you don't actually want to close it when closing a core.

          Otherwise, the basic idea seems fine - if the IW fails, release it's dir.

          Show
          Mark Miller added a comment - hmm...I have to look closer than just at the patch, but the setDirFactory method seems a little troubling - why do we do that instead of making it part of the constructor? What if you don't set it? + if ( null != directoryFactory) { + // :HACK: normally we rely on updateHandler to do this , + // but what if updateHandler failed to init? + directoryFactory.close(); + } I don't think you want that? The dir factory should and will be closed by the DefaultSolrCoreState when it's ref count hits 0 - you don't actually want to close it when closing a core. Otherwise, the basic idea seems fine - if the IW fails, release it's dir.
          Hide
          Hoss Man added a comment -

          the setDirFactory method seems a little troubling - why do we do that instead of making it part of the constructor? What if you don't set it?

          Are you asking about SolrIndexWriter.setDirectoryFactory ?

          it's private, and called from the (static) SolrIndexWriter.create – which as of this patch is the only way to get a new SolrIndexWRiter.

          I had to move it out of the SolrIndexWriter constructor, because we need to be able to pass the resulting directory to super() while also releasing the directory if the constructor throws an exception – this isn't possible within java's "calls to super(...) must be the first line of the constructor" constraints, so the static factory method was needed instead.

          The dir factory should and will be closed by the DefaultSolrCoreState when it's ref count hits 0 - you don't actually want to close it when closing a core.

          But the update handler is the only thing keeping track of the SolrCoreState – if the SolrCore has no UpdateHanlder (ie: because it failed to initialize) then something needs to close the DirectoryFactory that SolrCore.initDirectoryFactory() already created.

          (Go ahead and try to comment out that "HACK" directoryFactory.close() in the patch – you'll see TestBadConfig.testUpdateLogButNoVersionField start failing. I added that call to ensure that the DirectoryFactory gets closed if any exceptions are thrown between a (successsful) call to initDirectoryFactory() ~L641 of SolrCore and (a success of failure of) createUpdateHandler() ~L707. Don't get me wrong, it definitely feels like a hack (hence the comment) but something needs to account for this situation – and i wasn't comfortably completely re-writing/re-ordering the logic in the SolrCore constructor.

          Show
          Hoss Man added a comment - the setDirFactory method seems a little troubling - why do we do that instead of making it part of the constructor? What if you don't set it? Are you asking about SolrIndexWriter.setDirectoryFactory ? it's private, and called from the (static) SolrIndexWriter.create – which as of this patch is the only way to get a new SolrIndexWRiter. I had to move it out of the SolrIndexWriter constructor, because we need to be able to pass the resulting directory to super() while also releasing the directory if the constructor throws an exception – this isn't possible within java's "calls to super(...) must be the first line of the constructor" constraints, so the static factory method was needed instead. The dir factory should and will be closed by the DefaultSolrCoreState when it's ref count hits 0 - you don't actually want to close it when closing a core. But the update handler is the only thing keeping track of the SolrCoreState – if the SolrCore has no UpdateHanlder (ie: because it failed to initialize) then something needs to close the DirectoryFactory that SolrCore.initDirectoryFactory() already created. (Go ahead and try to comment out that "HACK" directoryFactory.close() in the patch – you'll see TestBadConfig.testUpdateLogButNoVersionField start failing. I added that call to ensure that the DirectoryFactory gets closed if any exceptions are thrown between a (successsful) call to initDirectoryFactory() ~L641 of SolrCore and (a success of failure of) createUpdateHandler() ~L707. Don't get me wrong, it definitely feels like a hack (hence the comment) but something needs to account for this situation – and i wasn't comfortably completely re-writing/re-ordering the logic in the SolrCore constructor.
          Hide
          Mark Miller added a comment -

          it's private,

          Ah, okay, did not catch that.

          something needs to close the DirectoryFactory that SolrCore.initDirectoryFactory() already created.

          Yeah, brain murmur - for some reason I was thinking DirectoryFactory was used globally across cores.

          This is probably fine then. Once I commit some stuff from my workspace, I'll actually apply the patch and try and consider this some more.

          Show
          Mark Miller added a comment - it's private, Ah, okay, did not catch that. something needs to close the DirectoryFactory that SolrCore.initDirectoryFactory() already created. Yeah, brain murmur - for some reason I was thinking DirectoryFactory was used globally across cores. This is probably fine then. Once I commit some stuff from my workspace, I'll actually apply the patch and try and consider this some more.
          Hide
          Hoss Man added a comment -

          Yeah, brain murmur - for some reason I was thinking DirectoryFactory was used globally across cores.

          I think it's only shared between reloaded versions of the same SolrCore (ie: two differnet SolrCore's can use completley diff DirectoryFactory impls) but even then, it's only shared if the SolrCore was constructed using an existing UpdateHandler (ie: reload) if the SolrCore is the "first" SolrCore, then it creates the DirectoryFactory, thus if the UpdateHandler doesn't init properly, that SolrCore needs to close the DirectoryFactory.

          Ideally, a whole metric-shit-load of the SolrCore initialization would be reworked, so thta every type of object (like DirectoryFactory) was only ever inited by one other object (like UpdateHandler) regardless of wether it's the first create or reload state, ... but i wasn't really comfortable making that heavy of a change.

          Show
          Hoss Man added a comment - Yeah, brain murmur - for some reason I was thinking DirectoryFactory was used globally across cores. I think it's only shared between reloaded versions of the same SolrCore (ie: two differnet SolrCore's can use completley diff DirectoryFactory impls) but even then, it's only shared if the SolrCore was constructed using an existing UpdateHandler (ie: reload) if the SolrCore is the "first" SolrCore, then it creates the DirectoryFactory, thus if the UpdateHandler doesn't init properly, that SolrCore needs to close the DirectoryFactory. Ideally, a whole metric-shit-load of the SolrCore initialization would be reworked, so thta every type of object (like DirectoryFactory) was only ever inited by one other object (like UpdateHandler) regardless of wether it's the first create or reload state, ... but i wasn't really comfortable making that heavy of a change.
          Hide
          Hoss Man added a comment -

          updated patch to trunk (incorporates the new IndexSplitter class yonik recently added)

          all tests pass, i'm going to press forward with committing so this can get more testing.

          Show
          Hoss Man added a comment - updated patch to trunk (incorporates the new IndexSplitter class yonik recently added) all tests pass, i'm going to press forward with committing so this can get more testing.
          Hide
          Hoss Man added a comment -

          Committed revision 1382187. - trunk
          Committed revision 1382192. - 4x

          Show
          Hoss Man added a comment - Committed revision 1382187. - trunk Committed revision 1382192. - 4x
          Hide
          Commit Tag Bot added a comment -

          [branch_4x commit] Chris M. Hostetter
          http://svn.apache.org/viewvc?view=revision&revision=1382192

          SOLR-3699: Fixed some Directory leaks when there were errors during SolrCore or SolrIndexWriter initialization (merge r1382187)

          Show
          Commit Tag Bot added a comment - [branch_4x commit] Chris M. Hostetter http://svn.apache.org/viewvc?view=revision&revision=1382192 SOLR-3699 : Fixed some Directory leaks when there were errors during SolrCore or SolrIndexWriter initialization (merge r1382187)
          Hide
          Uwe Schindler added a comment -

          Closed after release.

          Show
          Uwe Schindler added a comment - Closed after release.

            People

            • Assignee:
              Hoss Man
              Reporter:
              Robert Muir
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development