Solr
  1. Solr
  2. SOLR-2326

Replication command indexversion fails to return index version

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 4.9, 5.0
    • Component/s: replication (java)
    • Labels:
      None
    • Environment:

      Branch 3x latest

      Description

      To test this, I took the /example/multicore/core0 solrconfig and added a simple replication handler:

      <requestHandler name="/replication" class="solr.ReplicationHandler" >
      <lst name="master">
      <str name="replicateAfter">commit</str>
      <str name="replicateAfter">startup</str>
      <str name="confFiles">schema.xml</str>
      </lst>
      </requestHandler>

      When I query the handler for details I get back the indexVersion that I expect: http://localhost:8983/solr/core0/replication?command=details&wt=json&indent=true

      But when I ask for just the indexVersion I get back a 0, which prevent the slaves from pulling updates: http://localhost:8983/solr/core0/replication?command=indexversion&wt=json&indent=true

        Activity

        Hide
        Uwe Schindler added a comment -

        Move issue to Solr 4.9.

        Show
        Uwe Schindler added a comment - Move issue to Solr 4.9.
        Hide
        Steve Rowe added a comment -

        Bulk move 4.4 issues to 4.5 and 5.0

        Show
        Steve Rowe added a comment - Bulk move 4.4 issues to 4.5 and 5.0
        Hide
        Robert Muir added a comment -

        moving all 4.0 issues not touched in a month to 4.1

        Show
        Robert Muir added a comment - moving all 4.0 issues not touched in a month to 4.1
        Hide
        Robert Muir added a comment -

        rmuir20120906-bulk-40-change

        Show
        Robert Muir added a comment - rmuir20120906-bulk-40-change
        Hide
        David Sobon added a comment -

        I'm having similar problems.

        Fresh install, with populated index from apachesolr (drupal module / interface)

        Versions:
        solr 3.6.0
        jetty 6.1.26
        java-sun build 1.6.0_26-b03

        Master: details
        <str name="isMaster">false</str>
        <str name="isSlave">false</str>
        <long name="indexVersion">1343699948493</long>
        <long name="generation">128</long>

        Master: indexversion
        <long name="indexversion">0</long>
        <long name="generation">0</long>

        Slave: details
        <str name="isMaster">false</str>
        <str name="isSlave">true</str>
        <str name="timesFailed">1444</str>
        <str name="isPollingDisabled">false</str>
        <str name="isReplicating">false</str>

        Slave: indexversion
        <long name="indexversion">0</long>
        <long name="generation">0</long>

        Is there any facilities to debug this problem? Not having error messages of why master is not master is a bug.

        Show
        David Sobon added a comment - I'm having similar problems. Fresh install, with populated index from apachesolr (drupal module / interface) Versions: solr 3.6.0 jetty 6.1.26 java-sun build 1.6.0_26-b03 Master: details <str name="isMaster">false</str> <str name="isSlave">false</str> <long name="indexVersion">1343699948493</long> <long name="generation">128</long> Master: indexversion <long name="indexversion">0</long> <long name="generation">0</long> Slave: details <str name="isMaster">false</str> <str name="isSlave">true</str> <str name="timesFailed">1444</str> <str name="isPollingDisabled">false</str> <str name="isReplicating">false</str> Slave: indexversion <long name="indexversion">0</long> <long name="generation">0</long> Is there any facilities to debug this problem? Not having error messages of why master is not master is a bug.
        Hide
        Hoss Man added a comment -

        bulk fixing the version info for 4.0-ALPHA and 4.0 all affected issues have "hoss20120711-bulk-40-change" in comment

        Show
        Hoss Man added a comment - bulk fixing the version info for 4.0-ALPHA and 4.0 all affected issues have "hoss20120711-bulk-40-change" in comment
        Hide
        Jan Høydahl added a comment -

        Forget the above, I discovered a misconfiguration on our side. We started jetty without -Dsolr.enable.master=true as required by our solrconfig.xml setup. Now it's ok Opened a new issue SOLR-3176 for Solr to give some better logging in this case.

        Show
        Jan Høydahl added a comment - Forget the above, I discovered a misconfiguration on our side. We started jetty without -Dsolr.enable.master=true as required by our solrconfig.xml setup. Now it's ok Opened a new issue SOLR-3176 for Solr to give some better logging in this case.
        Hide
        Jan Høydahl added a comment -

        We're seeing also the same here on a 3.5 master: indexversion command returns 0, while the details command shows correct versions. Have not verified that it's because of locking, but based on behaviour and code inspection I'm pretty sure we don't get a valid commitPoint. Tried optimize on master, and also multiple reload cores, but it does not help. A restart of master did not help either.

        So seems we should clean up the locking in forceOpenWriter() to make sure we always have a valid commit point.
        And should we get commitPoint == null it would be better to fetch versions from the getIndexVersion() method than returning 0 wouldn't it?

        Show
        Jan Høydahl added a comment - We're seeing also the same here on a 3.5 master: indexversion command returns 0, while the details command shows correct versions. Have not verified that it's because of locking, but based on behaviour and code inspection I'm pretty sure we don't get a valid commitPoint. Tried optimize on master, and also multiple reload cores, but it does not help. A restart of master did not help either. So seems we should clean up the locking in forceOpenWriter() to make sure we always have a valid commit point. And should we get commitPoint == null it would be better to fetch versions from the getIndexVersion() method than returning 0 wouldn't it?
        Hide
        Matt Traynham added a comment - - edited

        One more heads up, I have found that if you change the lockType to single/noLock, the problem goes away. Which in turn is a good solution for me since I have this server configured as a repeater and only have found this problem with repeaters.

        Show
        Matt Traynham added a comment - - edited One more heads up, I have found that if you change the lockType to single/noLock, the problem goes away. Which in turn is a good solution for me since I have this server configured as a repeater and only have found this problem with repeaters.
        Hide
        Matt Traynham added a comment - - edited

        Hey Yury, I recently started seeing this same issue and thought I'd provide a bit of input into what I found debugging in my 3.3 Branch.
        I have found that a core reload does break the commits call after. But if you actually reload a second time, it is fixed again. This is because it is forcefully opening a new writer and a lock exception occurs every other time.

        During the inform method of ReplicationHandler, if you have configured replicate after startup, the direct update handler will forceOpenWriter().

        ReplicationHandler.java
        if (replicateAfter.contains("startup")) {
                        replicateOnStart = true;
                        RefCounted<SolrIndexSearcher> s = core.getNewestSearcher(false);
                        try {
                            IndexReader reader = s==null ? null : s.get().getReader();
                            if (reader!=null && reader.getIndexCommit() != null && reader.getIndexCommit().getGeneration() != 1L) {
                                try {
                                    if(replicateOnOptimize){
                                        Collection<IndexCommit> commits = IndexReader.listCommits(reader.directory());
                                        for (IndexCommit ic : commits) {
                                            if(ic.isOptimized()){
                                                if(indexCommitPoint == null || indexCommitPoint.getVersion() < ic.getVersion()) indexCommitPoint = ic;
                                            }
                                        }
                                    } else{
                                        indexCommitPoint = reader.getIndexCommit();
                                    }
                                } finally {
                                    // We don't need to save commit points for replication, the SolrDeletionPolicy
                                    // always saves the last commit point (and the last optimized commit point, if needed)
                                    /***
                                    if(indexCommitPoint != null){
                                        core.getDeletionPolicy().saveCommitPoint(indexCommitPoint.getVersion());
                                     }
                                     ***/
                                }
                            }
                            if (core.getUpdateHandler() instanceof DirectUpdateHandler2) {
                                ((DirectUpdateHandler2) core.getUpdateHandler()).forceOpenWriter();
                            } else {
                                LOG.warn("The update handler being used is not an instance or sub-class of DirectUpdateHandler2. " +
                                        "Replicate on Startup cannot work.");
                            } 
        

        Which will request a new lock, open a new writer and unlock. If a lock already exists the exception will be thrown:
        org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out and actually bail out of creating a new writer.

        DirectUpdateHandler2.java
           public void forceOpenWriter() throws IOException  {
            iwCommit.lock();
            try {
              openWriter();
            } finally {
              iwCommit.unlock();
            }
          }
        

        The openWriter method goes on to create a new SolrIndexWriter as well as a few other objects like IndexFileDeleter and IndexDeletionPolicyWrapper, which actually holds the commitPoints.

        IndexDeletionPolicyWrapper.java
        private volatile Map<Long, IndexCommit> solrVersionVsCommits = new ConcurrentHashMap<Long, IndexCommit>();
        
          /**
           * Internal use for Lucene... do not explicitly call.
           */
          public void onInit(List list) throws IOException {
            List<IndexCommitWrapper> wrapperList = wrap(list);
            deletionPolicy.onInit(wrapperList);
            updateCommitPoints(wrapperList);
            cleanReserves();
          }
        
          private void updateCommitPoints(List<IndexCommitWrapper> list) {
            Map<Long, IndexCommit> map = new ConcurrentHashMap<Long, IndexCommit>();
            for (IndexCommitWrapper wrapper : list) {
              if (!wrapper.isDeleted())
                map.put(wrapper.getVersion(), wrapper.delegate);
            }
            solrVersionVsCommits = map;
            latestCommit = ((list.get(list.size() - 1)).delegate);
          }
        
          /**
           * Gets the commit points for the index.
           * This map instance may change between commits and commit points may be deleted.
           * It is recommended to reserve a commit point for the duration of usage
           *
           * @return a Map of version to commit points
           */
          public Map<Long, IndexCommit> getCommits() {
            return solrVersionVsCommits;
          }
        

        The problem being, if a writer never gets created correctly, the init method on the IndexDeletionPolicyWrapper never gets called and the solrVersionVsCommits map is empty. If anyone has any input on a solution, that would be greatly appreciated.

        Thanks,
        Matt

        Show
        Matt Traynham added a comment - - edited Hey Yury, I recently started seeing this same issue and thought I'd provide a bit of input into what I found debugging in my 3.3 Branch. I have found that a core reload does break the commits call after. But if you actually reload a second time, it is fixed again. This is because it is forcefully opening a new writer and a lock exception occurs every other time. During the inform method of ReplicationHandler, if you have configured replicate after startup, the direct update handler will forceOpenWriter(). ReplicationHandler.java if (replicateAfter.contains( "startup" )) { replicateOnStart = true ; RefCounted<SolrIndexSearcher> s = core.getNewestSearcher( false ); try { IndexReader reader = s== null ? null : s.get().getReader(); if (reader!= null && reader.getIndexCommit() != null && reader.getIndexCommit().getGeneration() != 1L) { try { if (replicateOnOptimize){ Collection<IndexCommit> commits = IndexReader.listCommits(reader.directory()); for (IndexCommit ic : commits) { if (ic.isOptimized()){ if (indexCommitPoint == null || indexCommitPoint.getVersion() < ic.getVersion()) indexCommitPoint = ic; } } } else { indexCommitPoint = reader.getIndexCommit(); } } finally { // We don't need to save commit points for replication, the SolrDeletionPolicy // always saves the last commit point (and the last optimized commit point, if needed) /*** if (indexCommitPoint != null ){ core.getDeletionPolicy().saveCommitPoint(indexCommitPoint.getVersion()); } ***/ } } if (core.getUpdateHandler() instanceof DirectUpdateHandler2) { ((DirectUpdateHandler2) core.getUpdateHandler()).forceOpenWriter(); } else { LOG.warn( "The update handler being used is not an instance or sub-class of DirectUpdateHandler2. " + "Replicate on Startup cannot work." ); } Which will request a new lock, open a new writer and unlock. If a lock already exists the exception will be thrown: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out and actually bail out of creating a new writer. DirectUpdateHandler2.java public void forceOpenWriter() throws IOException { iwCommit.lock(); try { openWriter(); } finally { iwCommit.unlock(); } } The openWriter method goes on to create a new SolrIndexWriter as well as a few other objects like IndexFileDeleter and IndexDeletionPolicyWrapper, which actually holds the commitPoints. IndexDeletionPolicyWrapper.java private volatile Map< Long , IndexCommit> solrVersionVsCommits = new ConcurrentHashMap< Long , IndexCommit>(); /** * Internal use for Lucene... do not explicitly call. */ public void onInit(List list) throws IOException { List<IndexCommitWrapper> wrapperList = wrap(list); deletionPolicy.onInit(wrapperList); updateCommitPoints(wrapperList); cleanReserves(); } private void updateCommitPoints(List<IndexCommitWrapper> list) { Map< Long , IndexCommit> map = new ConcurrentHashMap< Long , IndexCommit>(); for (IndexCommitWrapper wrapper : list) { if (!wrapper.isDeleted()) map.put(wrapper.getVersion(), wrapper.delegate); } solrVersionVsCommits = map; latestCommit = ((list.get(list.size() - 1)).delegate); } /** * Gets the commit points for the index. * This map instance may change between commits and commit points may be deleted. * It is recommended to reserve a commit point for the duration of usage * * @ return a Map of version to commit points */ public Map< Long , IndexCommit> getCommits() { return solrVersionVsCommits; } The problem being, if a writer never gets created correctly, the init method on the IndexDeletionPolicyWrapper never gets called and the solrVersionVsCommits map is empty. If anyone has any input on a solution, that would be greatly appreciated. Thanks, Matt
        Hide
        Mark Miller added a comment -

        I don't think I ever replicated the original issue - we can push this to 3.6. I'd like to look at it one more time before giving up, but I don't know when.

        Show
        Mark Miller added a comment - I don't think I ever replicated the original issue - we can push this to 3.6. I'd like to look at it one more time before giving up, but I don't know when.
        Hide
        Simon Willnauer added a comment -

        mark, any chance this test/fix is coming in any time soon?

        Show
        Simon Willnauer added a comment - mark, any chance this test/fix is coming in any time soon?
        Hide
        Robert Muir added a comment -

        3.4 -> 3.5

        Show
        Robert Muir added a comment - 3.4 -> 3.5
        Hide
        Mark Miller added a comment -

        I think your main problem is this bug Yury: SOLR-2705

        I have a test case that I have to polish and a fix coming soon.

        Show
        Mark Miller added a comment - I think your main problem is this bug Yury: SOLR-2705 I have a test case that I have to polish and a fix coming soon.
        Hide
        Mark Miller added a comment -

        Hey Yury - I think you may have hit a different worse bug here. Looking into it.

        Show
        Mark Miller added a comment - Hey Yury - I think you may have hit a different worse bug here. Looking into it.
        Hide
        Yury Kats added a comment - - edited

        Looks like all failing code paths in ReplicationHandler lead to

        core.getDeletionPolicy()
        

        So either ReplicationHandler holds on to stale core instance or core's DeletionPolicy is not properly initialized when the core is being reloaded.

        Show
        Yury Kats added a comment - - edited Looks like all failing code paths in ReplicationHandler lead to core.getDeletionPolicy() So either ReplicationHandler holds on to stale core instance or core's DeletionPolicy is not properly initialized when the core is being reloaded.
        Hide
        Yury Kats added a comment -

        Another data point, /replication?command=commits

        After Solr startup:

        http://localhost:8983/solr/master1/replication?command=commits
        <response>
        <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">0</int>
        </lst>
        <arr name="commits">
        <lst>
        <long name="indexVersion">1312988447203</long>
        <long name="generation">2</long>
        <arr name="filelist">
        <str>_0.nrm</str>
        <str>_0_0.frq</str>
        <str>_0_0.tiv</str>
        ... and so on

        Now RELOAD the core:
        http://localhost:8983/solr/admin/cores?action=RELOAD&core=master1

        Now repeat "commits" command:
        http://localhost:8983/solr/master1/replication?command=commits
        <response>
        <lst name="responseHeader">
        <int name="status">0</int>
        <int name="QTime">0</int>
        </lst>
        <arr name="commits"/>
        </response>

        And it stays this way forever, even when new docs are being commited to the RELOADed core.

        Show
        Yury Kats added a comment - Another data point, /replication?command=commits After Solr startup: http://localhost:8983/solr/master1/replication?command=commits <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">0</int> </lst> <arr name="commits"> <lst> <long name="indexVersion">1312988447203</long> <long name="generation">2</long> <arr name="filelist"> <str>_0.nrm</str> <str>_0_0.frq</str> <str>_0_0.tiv</str> ... and so on Now RELOAD the core: http://localhost:8983/solr/admin/cores?action=RELOAD&core=master1 Now repeat "commits" command: http://localhost:8983/solr/master1/replication?command=commits <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">0</int> </lst> <arr name="commits"/> </response> And it stays this way forever, even when new docs are being commited to the RELOADed core.
        Hide
        Yury Kats added a comment -

        Could ReplicationHandler be holding on to a stale "core" instance after RELOAD?

        Show
        Yury Kats added a comment - Could ReplicationHandler be holding on to a stale "core" instance after RELOAD?
        Hide
        Yury Kats added a comment -

        OK, after some troubleshooting, my case is more obscure, but exposes the same problem.

        I have a "master" core with an index. Replication is configured after startup and after commit.
        I start Solr and ReplicationHandler reports correct information.
        "indexversion" is non-zero and "details" shows me a filelist for a specific index generation.
        All is well, replication is running fine.

        Now, my application RELOADs the "master" core (either by using /admin/cores?action=RELOAD or by using action=CREATE to recreate the same core in the same place. In both cases the index is preserved).

        Once the core is RELOADed, replication "details" still shows correct indexversion and generation, but the filelist is gone. And once a new commit happens on the RELOADed core, "indexversion" command starts reporting zero and never recovers.

        Therefore replication stops for good. The only way to make it replicate again is to restart Solr.

        Show
        Yury Kats added a comment - OK, after some troubleshooting, my case is more obscure, but exposes the same problem. I have a "master" core with an index. Replication is configured after startup and after commit. I start Solr and ReplicationHandler reports correct information. "indexversion" is non-zero and "details" shows me a filelist for a specific index generation. All is well, replication is running fine. Now, my application RELOADs the "master" core (either by using /admin/cores?action=RELOAD or by using action=CREATE to recreate the same core in the same place. In both cases the index is preserved). Once the core is RELOADed, replication "details" still shows correct indexversion and generation, but the filelist is gone. And once a new commit happens on the RELOADed core, "indexversion" command starts reporting zero and never recovers. Therefore replication stops for good. The only way to make it replicate again is to restart Solr.
        Hide
        Eric Pugh added a comment -

        I am out of the office 8/9 through 8/14. For urgent issues, please
        contact Jason Hull at jhull@opensourceconnections.com or phone at
        (434) 409-8451.

        Show
        Eric Pugh added a comment - I am out of the office 8/9 through 8/14. For urgent issues, please contact Jason Hull at jhull@opensourceconnections.com or phone at (434) 409-8451.
        Hide
        Yury Kats added a comment -

        I'm running into the same problem as well.
        /replication?command=indexversion returns a non-zero value upon startup, but turns into a zero after a commit. An "optimze" does not seem to bring it back.

        Show
        Yury Kats added a comment - I'm running into the same problem as well. /replication?command=indexversion returns a non-zero value upon startup, but turns into a zero after a commit. An "optimze" does not seem to bring it back.
        Hide
        Jeremy Custenborder added a comment - - edited

        I'm running into the same issue. My slave server has no update handlers. Calling /solr/core/replication?command=indexversion on the master always returned 0. I was looking at the code for the handler and found an interesting comment on line 125. It's currently configured to replicate after commit.

        This happens when replication is not configured to happen after startup and no commit/optimize
        has happened yet.

        This got me thinking so I issued the following command against the master

        curl 'http://127.0.0.1:8080/solr/core/update' -H "Content-Type: text/xml" --data-binary '<optimize/>'

        The next call to /solr/core/replication?command=indexversion returned a valid version and replication to the slave started.

        This makes me believe the problem is in this code block.

           if (command.equals(CMD_INDEX_VERSION)) {
              IndexCommit commitPoint = indexCommitPoint;  // make a copy so it won't change
              if (commitPoint != null && replicationEnabled.get()) {
                //
                // There is a race condition here.  The commit point may be changed / deleted by the time
                // we get around to reserving it.  This is a very small window though, and should not result
                // in a catastrophic failure, but will result in the client getting an empty file list for
                // the CMD_GET_FILE_LIST command.
                //
                core.getDeletionPolicy().setReserveDuration(commitPoint.getVersion(), reserveCommitDuration);
                rsp.add(CMD_INDEX_VERSION, commitPoint.getVersion());
                rsp.add(GENERATION, commitPoint.getGeneration());
              } else {
                // This happens when replication is not configured to happen after startup and no commit/optimize
                // has happened yet.
                rsp.add(CMD_INDEX_VERSION, 0L);
                rsp.add(GENERATION, 0L);
              }
            }
        

        It looks like there is a race condition resulting in indexCommitPoint being null. Look at the method postCommit() in getEventListener() method.

        public void postCommit() {
                IndexCommit currentCommitPoint = core.getDeletionPolicy().getLatestCommit();
        
                if (getCommit) {
                  // IndexCommit oldCommitPoint = indexCommitPoint;
                  indexCommitPoint = currentCommitPoint;
        
                  // We don't need to save commit points for replication, the SolrDeletionPolicy
                  // always saves the last commit point (and the last optimized commit point, if needed)
                  /***
                  if (indexCommitPoint != null) {
                    core.getDeletionPolicy().saveCommitPoint(indexCommitPoint.getVersion());
                  }
                  if(oldCommitPoint != null){
                    core.getDeletionPolicy().releaseCommitPoint(oldCommitPoint.getVersion());
                  }
                  ***/
                }
                if (snapshoot) {
                  try {
                    SnapShooter snapShooter = new SnapShooter(core, null);
                    snapShooter.createSnapAsync(currentCommitPoint, ReplicationHandler.this);
                  } catch (Exception e) {
                    LOG.error("Exception while snapshooting", e);
                  }
                }
              }
        

        This is the first time I see indexCommitPoint being set. Since indexCommitPoint is null until being loaded this causes a value of 0 to always be returned.

        If you call optimize like I did does your index start replicating? In my situation each core that returned 0/0 started replicating after I called optimize.

        Show
        Jeremy Custenborder added a comment - - edited I'm running into the same issue. My slave server has no update handlers. Calling /solr/core/replication?command=indexversion on the master always returned 0. I was looking at the code for the handler and found an interesting comment on line 125. It's currently configured to replicate after commit. This happens when replication is not configured to happen after startup and no commit/optimize has happened yet. This got me thinking so I issued the following command against the master curl 'http://127.0.0.1:8080/solr/core/update' -H "Content-Type: text/xml" --data-binary '<optimize/>' The next call to /solr/core/replication?command=indexversion returned a valid version and replication to the slave started. This makes me believe the problem is in this code block. if (command.equals(CMD_INDEX_VERSION)) { IndexCommit commitPoint = indexCommitPoint; // make a copy so it won't change if (commitPoint != null && replicationEnabled.get()) { // // There is a race condition here. The commit point may be changed / deleted by the time // we get around to reserving it. This is a very small window though, and should not result // in a catastrophic failure, but will result in the client getting an empty file list for // the CMD_GET_FILE_LIST command. // core.getDeletionPolicy().setReserveDuration(commitPoint.getVersion(), reserveCommitDuration); rsp.add(CMD_INDEX_VERSION, commitPoint.getVersion()); rsp.add(GENERATION, commitPoint.getGeneration()); } else { // This happens when replication is not configured to happen after startup and no commit/optimize // has happened yet. rsp.add(CMD_INDEX_VERSION, 0L); rsp.add(GENERATION, 0L); } } It looks like there is a race condition resulting in indexCommitPoint being null. Look at the method postCommit() in getEventListener() method. public void postCommit() { IndexCommit currentCommitPoint = core.getDeletionPolicy().getLatestCommit(); if (getCommit) { // IndexCommit oldCommitPoint = indexCommitPoint; indexCommitPoint = currentCommitPoint; // We don't need to save commit points for replication, the SolrDeletionPolicy // always saves the last commit point (and the last optimized commit point, if needed) /*** if (indexCommitPoint != null ) { core.getDeletionPolicy().saveCommitPoint(indexCommitPoint.getVersion()); } if (oldCommitPoint != null ){ core.getDeletionPolicy().releaseCommitPoint(oldCommitPoint.getVersion()); } ***/ } if (snapshoot) { try { SnapShooter snapShooter = new SnapShooter(core, null ); snapShooter.createSnapAsync(currentCommitPoint, ReplicationHandler. this ); } catch (Exception e) { LOG.error( "Exception while snapshooting" , e); } } } This is the first time I see indexCommitPoint being set. Since indexCommitPoint is null until being loaded this causes a value of 0 to always be returned. If you call optimize like I did does your index start replicating? In my situation each core that returned 0/0 started replicating after I called optimize.
        Hide
        Robert Muir added a comment -

        Bulk move 3.2 -> 3.3

        Show
        Robert Muir added a comment - Bulk move 3.2 -> 3.3
        Hide
        Mark Miller added a comment -

        Yeah - I actually won't be able to dig in for a bit - so push is fine with me.

        Show
        Mark Miller added a comment - Yeah - I actually won't be able to dig in for a bit - so push is fine with me.
        Hide
        Robert Muir added a comment -

        There isn't any patch here yet, can we move out to 3.2?

        Show
        Robert Muir added a comment - There isn't any patch here yet, can we move out to 3.2?
        Hide
        Eric Pugh added a comment -

        So I did discover one odd thing. If I don't have a /update update requesthandler listed in the solrconfig.xml, then the commitPoint is ALWAYS null, it's almost like having that in the stack causes the commitPoint to be done.

        My other datapoint, that I think, but haven't verified is that if you don't have the replicate on startup set, then it seems, but I am not positive, to give that result.

        One question I have is why is there that race condition? I mean, if the command=details works, then shouldn't indexversion work the same, or raise an error? versus returning a rather unuseful 0? Maybe just logging "no commitPoint found" would help.

        Show
        Eric Pugh added a comment - So I did discover one odd thing. If I don't have a /update update requesthandler listed in the solrconfig.xml, then the commitPoint is ALWAYS null, it's almost like having that in the stack causes the commitPoint to be done. My other datapoint, that I think, but haven't verified is that if you don't have the replicate on startup set, then it seems , but I am not positive, to give that result. One question I have is why is there that race condition? I mean, if the command=details works, then shouldn't indexversion work the same, or raise an error? versus returning a rather unuseful 0? Maybe just logging "no commitPoint found" would help.
        Hide
        Mark Miller added a comment -

        Thanks for the additional info Eric,

        So just to clarify: when you don't do a MERGE, you don't ever see this problem?

        The code snippet you have does look like where the action is likely happening to me to on first glance - but seems very odd you would still have the problem after an addDoc+commit eh?

        I'll try and see if I can work up a unit test.

        Show
        Mark Miller added a comment - Thanks for the additional info Eric, So just to clarify: when you don't do a MERGE, you don't ever see this problem? The code snippet you have does look like where the action is likely happening to me to on first glance - but seems very odd you would still have the problem after an addDoc+commit eh? I'll try and see if I can work up a unit test.
        Hide
        Eric Pugh added a comment -

        Mark, thanks for taking a look at this. I've been dinking around with this bug, and I think it's because I am using the MERGE command to merge two indexes together. When I do that, then I get the indexVersion 0. After doing a merge, even after adding a new document and doing a commit it doesn't work. I think it has to do with these lines from ReplicationHandler and a lack of a commitPoint:

        if (commitPoint != null && replicationEnabled.get())

        { // // There is a race condition here. The commit point may be changed / deleted by the time // we get around to reserving it. This is a very small window though, and should not result // in a catastrophic failure, but will result in the client getting an empty file list for // the CMD_GET_FILE_LIST command. // core.getDeletionPolicy().setReserveDuration(commitPoint.getVersion(), reserveCommitDuration); rsp.add(CMD_INDEX_VERSION, commitPoint.getVersion()); rsp.add(GENERATION, commitPoint.getGeneration()); }

        else

        { // This happens when replication is not configured to happen after startup and no commit/optimize // has happened yet. rsp.add(CMD_INDEX_VERSION, 0L); rsp.add(GENERATION, 0L); }
        Show
        Eric Pugh added a comment - Mark, thanks for taking a look at this. I've been dinking around with this bug, and I think it's because I am using the MERGE command to merge two indexes together. When I do that, then I get the indexVersion 0. After doing a merge, even after adding a new document and doing a commit it doesn't work. I think it has to do with these lines from ReplicationHandler and a lack of a commitPoint: if (commitPoint != null && replicationEnabled.get()) { // // There is a race condition here. The commit point may be changed / deleted by the time // we get around to reserving it. This is a very small window though, and should not result // in a catastrophic failure, but will result in the client getting an empty file list for // the CMD_GET_FILE_LIST command. // core.getDeletionPolicy().setReserveDuration(commitPoint.getVersion(), reserveCommitDuration); rsp.add(CMD_INDEX_VERSION, commitPoint.getVersion()); rsp.add(GENERATION, commitPoint.getGeneration()); } else { // This happens when replication is not configured to happen after startup and no commit/optimize // has happened yet. rsp.add(CMD_INDEX_VERSION, 0L); rsp.add(GENERATION, 0L); }

          People

          • Assignee:
            Mark Miller
            Reporter:
            Eric Pugh
          • Votes:
            5 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:

              Development