Solr
  1. Solr
  2. SOLR-4380

ReplicationHandler replicateAfter startup not showing commits after SOLR-3911

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.1
    • Fix Version/s: 4.2, 6.0
    • Component/s: replication (java)
    • Labels:
      None

      Description

      In the process of upgrading to 4.1 from 3.6, I've noticed that our master servers do not show any commit points – e.g. via http://127.0.0.1:8983/solr/collection1/replication?command=commits – available until after a new commit happens. So, for static indexes, replication doesn't happen and for dynamic indexes, we have to wait until an incremental update of master for slaves to see any commits.

      Tracing through the code, it looks like the change that may have effected us was part of SOLR-3911, specifically commenting out the initialization of the newIndexWriter in the replicateAfterStartup block [1]:

      // TODO: perhaps this is no longer necessary then?
      // core.getUpdateHandler().newIndexWriter(true);
      

      I'm guessing this is commented out because it is assumed that indexCommitPoint was going to be set by that block, but when a slave requests commits, that goes back to core.getDeletionPolicy().getCommits() to fetch the list of commits. If no indexWriter has been initialized, then, as far as I can tell, IndexDeletionPolicyWrapper#onInit will not have been called and there will be no commits available.

      By uncommenting this line, I was able to see commits on startup and slaves began to replicate successfully.

      [1] http://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/solr/core/src/java/org/apache/solr/handler/ReplicationHandler.java?annotate=1420992&diff_format=h&pathrev=1420992#l880

        Issue Links

          Activity

          Hide
          Andrzej Bialecki added a comment -

          I confirm this is an issue in my environment too, where we use a custom deletion policy to replicate additional side-load index data.

          If reopening a writer was too heavy-weight for NRT (was that the motivation for this change) then perhaps the replication code should use a different mechanism to obtain the list of IndexCommit-s if there is no writer?

          Show
          Andrzej Bialecki added a comment - I confirm this is an issue in my environment too, where we use a custom deletion policy to replicate additional side-load index data. If reopening a writer was too heavy-weight for NRT (was that the motivation for this change) then perhaps the replication code should use a different mechanism to obtain the list of IndexCommit-s if there is no writer?
          Hide
          Mark Miller added a comment -

          Gregg, can you check if the follow patch solves this for you?

          Index: solr/core/src/java/org/apache/solr/handler/ReplicationHandler.java
          ===================================================================
          --- solr/core/src/java/org/apache/solr/handler/ReplicationHandler.java	(revision 1440432)
          +++ solr/core/src/java/org/apache/solr/handler/ReplicationHandler.java	(working copy)
          @@ -42,6 +42,7 @@
           import org.apache.lucene.index.IndexCommit;
           import org.apache.lucene.index.IndexDeletionPolicy;
           import org.apache.lucene.index.DirectoryReader;
          +import org.apache.lucene.index.IndexWriter;
           import org.apache.lucene.store.Directory;
           import org.apache.lucene.store.IOContext;
           import org.apache.lucene.store.IndexInput;
          @@ -877,10 +878,9 @@
                       }
                     }
           
          -          // reboot the writer on the new index
          -          // TODO: perhaps this is no longer necessary then?
          -         // core.getUpdateHandler().newIndexWriter(true);
          -
          +          // ensure the writer is init'd
          +          RefCounted<IndexWriter> iw = core.getUpdateHandler().getSolrCoreState().getIndexWriter(core);
          +          iw.decref();
                   } catch (IOException e) {
                     LOG.warn("Unable to get IndexCommit on startup", e);
                   } finally {
          
          
          Show
          Mark Miller added a comment - Gregg, can you check if the follow patch solves this for you? Index: solr/core/src/java/org/apache/solr/handler/ReplicationHandler.java =================================================================== --- solr/core/src/java/org/apache/solr/handler/ReplicationHandler.java (revision 1440432) +++ solr/core/src/java/org/apache/solr/handler/ReplicationHandler.java (working copy) @@ -42,6 +42,7 @@ import org.apache.lucene.index.IndexCommit; import org.apache.lucene.index.IndexDeletionPolicy; import org.apache.lucene.index.DirectoryReader; +import org.apache.lucene.index.IndexWriter; import org.apache.lucene.store.Directory; import org.apache.lucene.store.IOContext; import org.apache.lucene.store.IndexInput; @@ -877,10 +878,9 @@ } } - // reboot the writer on the new index - // TODO: perhaps this is no longer necessary then? - // core.getUpdateHandler().newIndexWriter(true); - + // ensure the writer is init'd + RefCounted<IndexWriter> iw = core.getUpdateHandler().getSolrCoreState().getIndexWriter(core); + iw.decref(); } catch (IOException e) { LOG.warn("Unable to get IndexCommit on startup", e); } finally {
          Hide
          Mark Miller added a comment - - edited

          was that the motivation for this change

          No, based on the comment I'm thinking I couldn't see why it was necessary and no test failed without it.

          However, when I try to uncomment it, the tests seem to show problems given the work done in SOLR-3911. Perhaps that is part of why it's commented out as well, I don't remember.

          The above tries to just make sure the writer has been init'd rather than forcing a new one.

          Show
          Mark Miller added a comment - - edited was that the motivation for this change No, based on the comment I'm thinking I couldn't see why it was necessary and no test failed without it. However, when I try to uncomment it, the tests seem to show problems given the work done in SOLR-3911 . Perhaps that is part of why it's commented out as well, I don't remember. The above tries to just make sure the writer has been init'd rather than forcing a new one.
          Hide
          Gregg Donovan added a comment -

          Mark – Your new patch works perfectly for us. I like the approach better than just restoring the call to newIndexWriter, as well. If a writer already exists when ReplicationHandler#inform gets called, why make a new one?

          Show
          Gregg Donovan added a comment - Mark – Your new patch works perfectly for us. I like the approach better than just restoring the call to newIndexWriter, as well. If a writer already exists when ReplicationHandler#inform gets called, why make a new one?
          Hide
          Mark Miller added a comment -

          Thanks Gregg - I've added a test that catches this issue as well. I'll commit shortly.

          Show
          Mark Miller added a comment - Thanks Gregg - I've added a test that catches this issue as well. I'll commit shortly.
          Hide
          Mark Miller added a comment -

          Committed - accidentally tagged it as SOLR-3911 though.

          Show
          Mark Miller added a comment - Committed - accidentally tagged it as SOLR-3911 though.
          Hide
          Commit Tag Bot added a comment -

          [trunk commit] Mark Robert Miller
          http://svn.apache.org/viewvc?view=revision&revision=1440518

          SOLR-4380: fix JIRA issue number

          Show
          Commit Tag Bot added a comment - [trunk commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revision&revision=1440518 SOLR-4380 : fix JIRA issue number
          Hide
          Commit Tag Bot added a comment -

          [branch_4x commit] Mark Robert Miller
          http://svn.apache.org/viewvc?view=revision&revision=1440521

          SOLR-4380: fix JIRA issue number

          Show
          Commit Tag Bot added a comment - [branch_4x commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revision&revision=1440521 SOLR-4380 : fix JIRA issue number
          Hide
          Uwe Schindler added a comment -

          Closed after release.

          Show
          Uwe Schindler added a comment - Closed after release.

            People

            • Assignee:
              Mark Miller
              Reporter:
              Gregg Donovan
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development