Solr
  1. Solr
  2. SOLR-3861

Refactor SolrCoreState so that it's managed by SolrCore .

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.1, 6.0
    • Component/s: None
    • Labels:
      None

      Description

      SOLR-2008 fixed a possible RejectedExecutionException by ensuring that SolrCore closed the updateHandler before the searcherExecutor.

      Mark Miller re-flipped this logic in r1159378, which is annotated as fixing both SOLR-2654 and SOLR-2654 (dup typo i guess) but it's not clear why - pretty sure this means that the risk of a Rejected exception is back in 4.0-BETA...

      https://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apache/solr/core/SolrCore.java?r1=1146905&r2=1159378

      1. SOLR-3861.patch
        13 kB
        Mark Miller
      2. SOLR-3861.patch
        20 kB
        Mark Miller
      3. SOLR-3861.patch
        3 kB
        Mark Miller

        Issue Links

          Activity

          Hide
          Hoss Man added a comment -

          mark: can you please sanity check whether this was an intentional change or a fluke? is the original risk in SOLR-2008 now eliminated by some other change to how the UpdateHandler is used?

          Show
          Hoss Man added a comment - mark: can you please sanity check whether this was an intentional change or a fluke? is the original risk in SOLR-2008 now eliminated by some other change to how the UpdateHandler is used?
          Hide
          Sami Siren added a comment -

          Not sure if this is related but the famous TestReplicationHandler failures seem to throw the mentioned RejectedExecutionException when the test fails. for example here: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Linux/1221/

          Show
          Sami Siren added a comment - Not sure if this is related but the famous TestReplicationHandler failures seem to throw the mentioned RejectedExecutionException when the test fails. for example here: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Linux/1221/
          Hide
          Mark Miller added a comment -

          I'll look at this now. I did it based on some experience with the chaos monkey - I did not consider this past issue though, so I'll have to revisit this.

          Show
          Mark Miller added a comment - I'll look at this now. I did it based on some experience with the chaos monkey - I did not consider this past issue though, so I'll have to revisit this.
          Hide
          Mark Miller added a comment -

          So I did this as part of a long effort at hardening recovery. Some things I tried and seemed to help at some point for some reason, and then later fixes made them unnecessary. I think that was the case with this change, and I'll flip back the ordering. Std tests and chaos monkey tests are still passing.

          Show
          Mark Miller added a comment - So I did this as part of a long effort at hardening recovery. Some things I tried and seemed to help at some point for some reason, and then later fixes made them unnecessary. I think that was the case with this change, and I'll flip back the ordering. Std tests and chaos monkey tests are still passing.
          Hide
          Mark Miller added a comment -

          I see what the issue is - closing the updatehandler will close the dir factory - before the searcher releases it's dir. That is why it's switched.

          Show
          Mark Miller added a comment - I see what the issue is - closing the updatehandler will close the dir factory - before the searcher releases it's dir. That is why it's switched.
          Hide
          Mark Miller added a comment -

          I'm not sure I see the problem with the rejected execution, but I do see the problem of closing things in an order that prevents it. I think the current code is correct.

          Show
          Mark Miller added a comment - I'm not sure I see the problem with the rejected execution, but I do see the problem of closing things in an order that prevents it. I think the current code is correct.
          Hide
          Mark Miller added a comment -

          Hossman - closing this as I don't think it requires a change. If you disagree, please reopen.

          Show
          Mark Miller added a comment - Hossman - closing this as I don't think it requires a change. If you disagree, please reopen.
          Hide
          Mark Miller added a comment -

          In IRC Hossman brought up the case where you don't use an update log - apparently auto commit promises a final commit before shutdown. We probably need better testing for that type of functionality, but we should deal with this.

          Show
          Mark Miller added a comment - In IRC Hossman brought up the case where you don't use an update log - apparently auto commit promises a final commit before shutdown. We probably need better testing for that type of functionality, but we should deal with this.
          Hide
          Mark Miller added a comment -

          If someone wants to work on it now, we might be able to get it in any rc respin though.

          Show
          Mark Miller added a comment - If someone wants to work on it now, we might be able to get it in any rc respin though.
          Hide
          Mark Miller added a comment -

          I think something like this would work - a little hokey on the API side though.

          Show
          Mark Miller added a comment - I think something like this would work - a little hokey on the API side though.
          Hide
          Mark Miller added a comment -

          Patch is a little off - doesn't pass tests - but shows the general idea that we probably want to improve on.

          Show
          Mark Miller added a comment - Patch is a little off - doesn't pass tests - but shows the general idea that we probably want to improve on.
          Hide
          Hoss Man added a comment -

          comments on the patch:

          1) overall it seems correct.

          2) probably want to move the ":HACK: normally we rely on updateHandler to do this" down to a new else clauses on the the if block where getSolrCoreState().tryCloseDirectory() is called just for sanity's sake.

          3) Maybe i'm missing something, but it's not really clear to me why we need the new tryCloseDirectory ... it seems like it would be more straight forward just to make SolrCore responsible for creating/decrefing/closing the SolrCoreState before/after creating/closing the UpdateHandler? (the three arg UpdateHandler constructor could still ask the previous UpdateHandler for the solrCoreState, but the default behavior would be to get it from the SolreCore. )

          Show
          Hoss Man added a comment - comments on the patch: 1) overall it seems correct. 2) probably want to move the ":HACK: normally we rely on updateHandler to do this" down to a new else clauses on the the if block where getSolrCoreState().tryCloseDirectory() is called just for sanity's sake. 3) Maybe i'm missing something, but it's not really clear to me why we need the new tryCloseDirectory ... it seems like it would be more straight forward just to make SolrCore responsible for creating/decrefing/closing the SolrCoreState before/after creating/closing the UpdateHandler? (the three arg UpdateHandler constructor could still ask the previous UpdateHandler for the solrCoreState, but the default behavior would be to get it from the SolreCore. )
          Hide
          Mark Miller added a comment -

          3)

          I think it was actually more straight forward to do what I first did - only took a second. Your suggestion makes more logical sense and solves the hackey API issue, but I think it's more of a tricky change to get right. I've started playing with it.

          Show
          Mark Miller added a comment - 3) I think it was actually more straight forward to do what I first did - only took a second. Your suggestion makes more logical sense and solves the hackey API issue, but I think it's more of a tricky change to get right. I've started playing with it.
          Hide
          Yandong Yao added a comment -

          @Hoss,

          Comment related with SOLR-2008 and this:

          We are using many shards, and will load/unload shards according to its usage pattern. Sometime when we try to unload shards, we encounter issues mentioned in SOLR-2008.

          I am not sure whether this will result in any data loss, (we have encountered some issues, while it is caused by our client code mostly, we have used read write lock to protect unloading shard code, so that if there are threads indexing data to this shard, then it won't be unloaded. After this change, so far so good. while we still encounter exception mentioned in SOLR-2008 in solr log).

          Also we have encountered following exception sometimes when we try to unloading shard (we will always call commit before unloading shard):

          BTW: we are using one old (lucene.version=4.0-SNAPSHOT 1189391) solr 4 snapshot build.

          Sep 14, 2012 4:49:22 AM org.apache.solr.common.SolrException log
          SEVERE: auto commit error...:java.lang.RuntimeException: org.apache.lucene.util.ThreadInterruptedException: java.lang.InterruptedException: sleep interrupted
          at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1140)
          at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:999)
          at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:339)
          at org.apache.solr.update.CommitTracker.run(CommitTracker.java:184)
          at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
          at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
          at java.util.concurrent.FutureTask.run(FutureTask.java:138)
          at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
          at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
          at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
          at java.lang.Thread.run(Thread.java:662)
          Caused by: org.apache.lucene.util.ThreadInterruptedException: java.lang.InterruptedException: sleep interrupted
          at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:644)
          at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:84)
          at org.apache.lucene.index.IndexReader.open(IndexReader.java:536)
          at org.apache.lucene.index.IndexReader.open(IndexReader.java:495)
          at org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:38)
          at org.apache.solr.search.SolrIndexSearcher.<init>(SolrIndexSearcher.java:109)
          at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1130)
          ... 11 more
          Caused by: java.lang.InterruptedException: sleep interrupted
          at java.lang.Thread.sleep(Native Method)
          at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:642)
          ... 17 more

          Show
          Yandong Yao added a comment - @Hoss, Comment related with SOLR-2008 and this: We are using many shards, and will load/unload shards according to its usage pattern. Sometime when we try to unload shards, we encounter issues mentioned in SOLR-2008 . I am not sure whether this will result in any data loss, (we have encountered some issues, while it is caused by our client code mostly, we have used read write lock to protect unloading shard code, so that if there are threads indexing data to this shard, then it won't be unloaded. After this change, so far so good. while we still encounter exception mentioned in SOLR-2008 in solr log). Also we have encountered following exception sometimes when we try to unloading shard (we will always call commit before unloading shard): BTW: we are using one old (lucene.version=4.0-SNAPSHOT 1189391) solr 4 snapshot build. Sep 14, 2012 4:49:22 AM org.apache.solr.common.SolrException log SEVERE: auto commit error...:java.lang.RuntimeException: org.apache.lucene.util.ThreadInterruptedException: java.lang.InterruptedException: sleep interrupted at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1140) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:999) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:339) at org.apache.solr.update.CommitTracker.run(CommitTracker.java:184) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.lucene.util.ThreadInterruptedException: java.lang.InterruptedException: sleep interrupted at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:644) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:84) at org.apache.lucene.index.IndexReader.open(IndexReader.java:536) at org.apache.lucene.index.IndexReader.open(IndexReader.java:495) at org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:38) at org.apache.solr.search.SolrIndexSearcher.<init>(SolrIndexSearcher.java:109) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1130) ... 11 more Caused by: java.lang.InterruptedException: sleep interrupted at java.lang.Thread.sleep(Native Method) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:642) ... 17 more
          Hide
          Mark Miller added a comment -

          Here is a rough patch managing the SolrCore state more in SolrCore.

          Something is off still - TestIndexingPerformance fails - the final rollback it does causes a write.lock to be left behind when we try and close a directory.

          Show
          Mark Miller added a comment - Here is a rough patch managing the SolrCore state more in SolrCore. Something is off still - TestIndexingPerformance fails - the final rollback it does causes a write.lock to be left behind when we try and close a directory.
          Hide
          Mark Miller added a comment - - edited

          Here is a completed patch.

          Show
          Mark Miller added a comment - - edited Here is a completed patch.
          Hide
          Mark Miller added a comment -

          Looks like as Yonik says in SOLR-3884, the final shutdown commit is explicit not something that comes from auto commit. And the searcher we may try to open and get a rejeceted execution on, is done after commit anyway.

          So I'm back to really not thinking this is a problem. The refactoring work done in the last patch is much better then what we have though.

          Show
          Mark Miller added a comment - Looks like as Yonik says in SOLR-3884 , the final shutdown commit is explicit not something that comes from auto commit. And the searcher we may try to open and get a rejeceted execution on, is done after commit anyway. So I'm back to really not thinking this is a problem. The refactoring work done in the last patch is much better then what we have though.
          Hide
          Mark Miller added a comment -

          So I'll commit this to 5x and 4x, but my current thinking is again that it's not a big deal at all, so I would not try and put it in 4.0.

          Show
          Mark Miller added a comment - So I'll commit this to 5x and 4x, but my current thinking is again that it's not a big deal at all, so I would not try and put it in 4.0.
          Hide
          Yandong Yao added a comment -

          Hi Mark,

          do you mean this issue (RejectException) won't impact system such as lost uncommitted documents?

          It seems happen when autocommit code try to run while the core is closing, could we run commit before close and then disable autocommit in core closing code.

          How about above 'sleep interrupted' exception? Are they caused by same reason?

          Thanks,
          Yandong

          Show
          Yandong Yao added a comment - Hi Mark, do you mean this issue (RejectException) won't impact system such as lost uncommitted documents? It seems happen when autocommit code try to run while the core is closing, could we run commit before close and then disable autocommit in core closing code. How about above 'sleep interrupted' exception? Are they caused by same reason? Thanks, Yandong
          Hide
          Mark Miller added a comment -

          do you mean this issue (RejectException) won't impact system such as lost uncommitted documents?

          I don't believe so. If you have the UpdateLog on, you would recover any lost updates there - but this rejected execution is after the commit anyway - trying to open the searcher after the commit.

          Also, the UpdateHandler does an explicit commit on shutdown that has nothing to do with auto commit, and does not rely on needing to open a searcher after it's done.

          I think if anything, it's an ugly mark in the logs.

          The api changes above will remove the ugly part I think - but as far as I know it's all just cosmetic.

          Show
          Mark Miller added a comment - do you mean this issue (RejectException) won't impact system such as lost uncommitted documents? I don't believe so. If you have the UpdateLog on, you would recover any lost updates there - but this rejected execution is after the commit anyway - trying to open the searcher after the commit. Also, the UpdateHandler does an explicit commit on shutdown that has nothing to do with auto commit, and does not rely on needing to open a searcher after it's done. I think if anything, it's an ugly mark in the logs. The api changes above will remove the ugly part I think - but as far as I know it's all just cosmetic.
          Hide
          Mark Miller added a comment -

          I'll commit this refactor shortly.

          Show
          Mark Miller added a comment - I'll commit this refactor shortly.
          Hide
          Commit Tag Bot added a comment -

          [branch_4x commit] Mark Robert Miller
          http://svn.apache.org/viewvc?view=revision&revision=1392398

          SOLR-3861: Refactor SolrCoreState so that it's managed by SolrCore.

          Show
          Commit Tag Bot added a comment - [branch_4x commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revision&revision=1392398 SOLR-3861 : Refactor SolrCoreState so that it's managed by SolrCore.
          Hide
          Commit Tag Bot added a comment -

          [branch_4x commit] Mark Robert Miller
          http://svn.apache.org/viewvc?view=revision&revision=1392398

          SOLR-3861: Refactor SolrCoreState so that it's managed by SolrCore.

          Show
          Commit Tag Bot added a comment - [branch_4x commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revision&revision=1392398 SOLR-3861 : Refactor SolrCoreState so that it's managed by SolrCore.

            People

            • Assignee:
              Mark Miller
              Reporter:
              Hoss Man
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development