Solr
  1. Solr
  2. SOLR-3621

Fix concurrency race around newIndexWriter

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.1, 6.0
    • Component/s: update
    • Labels:
      None

      Description

      When I did the first big refactor on update handler, I was trying to never close the index writer - I had to give in on this goal due to the replication handler - it requires rebooting the indexwriter. At the time, I settled for allowing a little race that didn't show up as an issue in tests - this IW reboot was always a bit of a hack in the past anyhow.

      Now that the dust has settled, we should make this air tight though. I'd like to make opening a new indexwriter a full class citizen rather than a hacky method only used minimally for replication to reboot things. It should be a solid API that is valid for any uses down the road.

      For some IW config changes, we may want to do it in 'some' cases on reload.

      To do this, we have to start ref counting iw use - so that we only actually open a new one and close the old one when it's not in use at all.

        Activity

        Hide
        Mark Miller added a comment -

        A first patch.

        Show
        Mark Miller added a comment - A first patch.
        Hide
        Uwe Schindler added a comment -

        Hi, this seems to have caused a hang in DIH:

        [junit4:junit4] "TEST-TestScope-org.apache.solr.handler.dataimport.TestSqlEntityProcessorDelta2.testCompositePk_DeltaImport_delete-seed#[8C1D20BFF6E9C021]" prio=10 tid=0x00007f1cb4560800 nid=0x1227 in Object.wait() [0x00007f1c92c02000]
        [junit4:junit4]    java.lang.Thread.State: WAITING (on object monitor)
        [junit4:junit4] 	at java.lang.Object.wait(Native Method)
        [junit4:junit4] 	- waiting on <0x00000000fb258658> (a org.apache.solr.update.DefaultSolrCoreState)
        [junit4:junit4] 	at java.lang.Object.wait(Object.java:485)
        [junit4:junit4] 	at org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:59)
        [junit4:junit4] 	- locked <0x00000000fb258658> (a org.apache.solr.update.DefaultSolrCoreState)
        [junit4:junit4] 	at org.apache.solr.update.DirectUpdateHandler2.deleteAll(DirectUpdateHandler2.java:140)
        [junit4:junit4] 	at org.apache.solr.update.DirectUpdateHandler2.deleteByQuery(DirectUpdateHandler2.java:361)
        [junit4:junit4] 	- locked <0x00000000fb2584d8> (a org.apache.solr.update.DirectUpdateHandler2)
        [junit4:junit4] 	at org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:67)
        [junit4:junit4] 	at org.apache.solr.update.processor.UpdateRequestProcessor.processDelete(UpdateRequestProcessor.java:55)
        [junit4:junit4] 	at org.apache.solr.update.processor.DistributedUpdateProcessor.doDeleteByQuery(DistributedUpdateProcessor.java:728)
        [junit4:junit4] 	at org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:601)
        [junit4:junit4] 	at org.apache.solr.update.processor.UpdateRequestProcessor.processDelete(UpdateRequestProcessor.java:55)
        [junit4:junit4] 	at org.apache.solr.handler.dataimport.AbstractDataImportHandlerTestCase$TestUpdateRequestProcessor.processDelete(AbstractDataImportHandlerTestCase.java:364)
        

        See this build for a complete stack trace: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Linux-Java6-64/1291/console

        Show
        Uwe Schindler added a comment - Hi, this seems to have caused a hang in DIH: [junit4:junit4] "TEST-TestScope-org.apache.solr.handler.dataimport.TestSqlEntityProcessorDelta2.testCompositePk_DeltaImport_delete-seed#[8C1D20BFF6E9C021]" prio=10 tid=0x00007f1cb4560800 nid=0x1227 in Object.wait() [0x00007f1c92c02000] [junit4:junit4] java.lang.Thread.State: WAITING (on object monitor) [junit4:junit4] at java.lang.Object.wait(Native Method) [junit4:junit4] - waiting on <0x00000000fb258658> (a org.apache.solr.update.DefaultSolrCoreState) [junit4:junit4] at java.lang.Object.wait(Object.java:485) [junit4:junit4] at org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:59) [junit4:junit4] - locked <0x00000000fb258658> (a org.apache.solr.update.DefaultSolrCoreState) [junit4:junit4] at org.apache.solr.update.DirectUpdateHandler2.deleteAll(DirectUpdateHandler2.java:140) [junit4:junit4] at org.apache.solr.update.DirectUpdateHandler2.deleteByQuery(DirectUpdateHandler2.java:361) [junit4:junit4] - locked <0x00000000fb2584d8> (a org.apache.solr.update.DirectUpdateHandler2) [junit4:junit4] at org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:67) [junit4:junit4] at org.apache.solr.update.processor.UpdateRequestProcessor.processDelete(UpdateRequestProcessor.java:55) [junit4:junit4] at org.apache.solr.update.processor.DistributedUpdateProcessor.doDeleteByQuery(DistributedUpdateProcessor.java:728) [junit4:junit4] at org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:601) [junit4:junit4] at org.apache.solr.update.processor.UpdateRequestProcessor.processDelete(UpdateRequestProcessor.java:55) [junit4:junit4] at org.apache.solr.handler.dataimport.AbstractDataImportHandlerTestCase$TestUpdateRequestProcessor.processDelete(AbstractDataImportHandlerTestCase.java:364) See this build for a complete stack trace: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Linux-Java6-64/1291/console
        Hide
        Mark Miller added a comment -

        Looking...

        Show
        Mark Miller added a comment - Looking...
        Hide
        Mark Miller added a comment -

        I think this is related to the fail that kept happening in DIH tests previously (I was actually hoping this would help fix that test and is why I started looking at this the other day again).

        Part of the problem with that test may be that it uses RAMDir though - I don't think that in Solr that is compatible with booting a new IndexWriter - which happens during rollback, which happens in some error situations with DIH. I think that's the "no segments file found issue" that sometimes fails.

        Anyhow, my best current guess for this case is that opening a new writer fails, causes an exception, the getWriter lock is never released, someone tries to get a writer, they wait forever. I'm changing to release that lock in a finally so hopefully the true error is more visible and there is no hang.

        The other fix may be to hard code fsdir for these tests that might rollback - but I'll wait on that.

        Show
        Mark Miller added a comment - I think this is related to the fail that kept happening in DIH tests previously (I was actually hoping this would help fix that test and is why I started looking at this the other day again). Part of the problem with that test may be that it uses RAMDir though - I don't think that in Solr that is compatible with booting a new IndexWriter - which happens during rollback, which happens in some error situations with DIH. I think that's the "no segments file found issue" that sometimes fails. Anyhow, my best current guess for this case is that opening a new writer fails, causes an exception, the getWriter lock is never released, someone tries to get a writer, they wait forever. I'm changing to release that lock in a finally so hopefully the true error is more visible and there is no hang. The other fix may be to hard code fsdir for these tests that might rollback - but I'll wait on that.
        Hide
        Robert Muir added a comment -

        rmuir20120906-bulk-40-change

        Show
        Robert Muir added a comment - rmuir20120906-bulk-40-change
        Hide
        Robert Muir added a comment -

        moving all 4.0 issues not touched in a month to 4.1

        Show
        Robert Muir added a comment - moving all 4.0 issues not touched in a month to 4.1
        Hide
        Mark Miller added a comment -

        Have no seen any reports of problems here in a while, and all this had hardened a fair amount by now.

        Show
        Mark Miller added a comment - Have no seen any reports of problems here in a while, and all this had hardened a fair amount by now.

          People

          • Assignee:
            Mark Miller
            Reporter:
            Mark Miller
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development