Solr
  1. Solr
  2. SOLR-4413

SolrCore#getIndexDir() contract change between 3.6 and 4.1

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.0, 4.1
    • Fix Version/s: 4.2, 6.0
    • Component/s: None
    • Labels:
      None

      Description

      In SVN 1420992, SolrCore#getIndexDir() changed it's implementation from a version that would reflect the value of the index property in index.properties to one that does not.

      In 3.6, SolrCore#getIndexDir() was:

      public String getIndexDir() {
        synchronized (searcherLock) {
          if (_searcher == null)	  	
            return dataDir + "index/";
          SolrIndexSearcher searcher = _searcher.get();
          return searcher.getIndexDir() == null ? dataDir + "index/" : searcher.getIndexDir();
      }
      

      In 3.6, SolrIndexSearcher would be passed the value of SolrCore#getNewIndexDir() – which reads index.properties – in its constructor and return it when SolrIndexSearcher#getIndexDir() was called.

      In 4.1, SolrCore#getIndexDir() is:

        public String getIndexDir() {  
          return dataDir + "index/";
        }
      

      Clients of SolrCore#getIndexDir() that were expecting the previous behavior are likely to have issues. E.g.:

      --In CoreAdminHandler#handleUnloadAction(SolrQueryRequest, SolrQueryResponse) if the deleteIndex flag is set to true, it calls core.getDirectoryFactory().remove(core.getIndexDir()). If a value other than index/ is set in index.properties, the wrong directory will be deleted.

      --In CoreAdminHandler#getIndexSize(SolrCore), the existence of SolrCore#getIndexDir() is checked before SolrCore#getNewIndexDir(). If a value other than index/ is set in index.properties, this will return the size of the wrong directory.

      1. SOLR-4413.patch
        7 kB
        Mark Miller
      2. SOLR-4413.patch
        5 kB
        Gregg Donovan

        Issue Links

          Activity

          Hide
          Mark Miller added a comment -

          AFAICT this changed when I finished making Directory first class. You can't so easily get the index dir from the searcher (which got it from the fs directory). I'll see what we can do to improve it for 4.2.

          Show
          Mark Miller added a comment - AFAICT this changed when I finished making Directory first class. You can't so easily get the index dir from the searcher (which got it from the fs directory). I'll see what we can do to improve it for 4.2.
          Hide
          Gregg Donovan added a comment -

          This patch is the full set of changes we needed to make the index.properties edit + core reload reindexing mechanism that we used in 3.6 work in 4.1.

          In addition to the changes to SolrCore#getIndexDir(), we needed to check for a change of directory when the new SolrCore decided to either inherit a DirectoryFactory, DeletionPolicy, or UpdateHandler from the previous core or reuse the previous core's Directory for a searcher. In each case, the state from the previous core would cause problems when inherited by the new core.

          The test suite passes with the full set of changes, but I'm not sure they should all be grouped in one JIRA or not. Mark – please let me know if you think they should be broken out. If not, perhaps we should change the title of this ticket?

          Show
          Gregg Donovan added a comment - This patch is the full set of changes we needed to make the index.properties edit + core reload reindexing mechanism that we used in 3.6 work in 4.1. In addition to the changes to SolrCore#getIndexDir(), we needed to check for a change of directory when the new SolrCore decided to either inherit a DirectoryFactory, DeletionPolicy, or UpdateHandler from the previous core or reuse the previous core's Directory for a searcher. In each case, the state from the previous core would cause problems when inherited by the new core. The test suite passes with the full set of changes, but I'm not sure they should all be grouped in one JIRA or not. Mark – please let me know if you think they should be broken out. If not, perhaps we should change the title of this ticket?
          Hide
          Mark Miller added a comment -

          In each case, the state from the previous core would cause problems when inherited by the new core.

          Could you elaborate on these problems?

          Show
          Mark Miller added a comment - In each case, the state from the previous core would cause problems when inherited by the new core. Could you elaborate on these problems?
          Hide
          Gregg Donovan added a comment -

          Could you elaborate on these problems?

          SolrDeletionPolicy
          In Solr 4.1, a core reload sets the new deletion policy to be the previous core's IndexDeletionPolicyWrapper:

               SolrCore core = new SolrCore(getName(), getDataDir(), config,
                   schema, coreDescriptor, updateHandler, prev);
               core.solrDelPolicy = this.solrDelPolicy;
               return core;
          

          This would cause problems when trying to get the commit points because the commits from the previous directory would be inherited. In our case, we would start with an empty index at index/. Then, after a reindex, we would have a new index at index-<timestamp>/ at generation 2 (we do one incremental after a reindex before swapping it in). When the ReplicationHandler tried to find the commits for generation 2 during replication, it couldn't find them. core.getDeletionPolicy().getCommits() would yield the map for index/, not index-<timestamp>/.

          Directory
          Solr 4.1 has an optimization to use the IndexWriter from the previous SolrCore to create a new DirectoryReader:

                // use the (old) writer to open the first searcher
                RefCounted<IndexWriter> iwRef = null;
                if (prev != null) {
                  iwRef = prev.getUpdateHandler().getSolrCoreState().getIndexWriter(null);
                  if (iwRef != null) {
                    final IndexWriter iw = iwRef.get();
                    newReaderCreator = new Callable<DirectoryReader>() {
                      @Override
                      public DirectoryReader call() throws Exception {
                        return DirectoryReader.open(iw, true);
                      }
                    };
                  }
                }
          

          In our case, the old core refers to index/ and the core being created is at index-<timestamp>/, so the DirectoryReader from the old core points to a directory that is no longer in use. After doing the index.properties update and core reload, the newly created SolrIndexSearcher would still point to index/. It would take an empty commit to get a new reader pointed at index-<timestamp>/.

          UpdateHandler
          The UpdateHandler has a reference to the SolrCoreState/DefaultSolrCoreState which has a reference to the SolrIndexWriter. So, when we inherited the UpdateHandler from index/ after the reload pointed to index-<timestamp>/, commits would still go to index/.

          It's unfortunate to lose all of the stats in DirectUpdateHandler2, but maybe it makes sense to reset them anyway when the Directory changes?

          DirectoryFactory
          I should clarify: the reason for not inheriting DirectoryFactory from the previous core was not actually related to state. Rather, the current logic in 4.1 for whether to inherit the DirectoryFactory is:

                if (updateHandler == null) {
                  initDirectoryFactory();
                  solrCoreState = new DefaultSolrCoreState(getDirectoryFactory());
                } else {
                  solrCoreState = updateHandler.getSolrCoreState();
                  directoryFactory = solrCoreState.getDirectoryFactory();
                  this.isReloaded = true;
                }
          

          I removed inheriting of the DirectoryFactory of the previous core because SolrCore#getNewIndexDir() uses this.directoryFactory to read index.properties and getNewIndexDir() was needed to determine whether the new core was using the same directory as the previous core. Perhaps this can be solved by creating a private version of SolrCore#getNewIndexDir that takes a DirectoryFactory as an argument? On second look I don't think I see state in DirectoryFactory that should cause problems when a new directory is used.

          Show
          Gregg Donovan added a comment - Could you elaborate on these problems? SolrDeletionPolicy In Solr 4.1, a core reload sets the new deletion policy to be the previous core's IndexDeletionPolicyWrapper : SolrCore core = new SolrCore(getName(), getDataDir(), config, schema, coreDescriptor, updateHandler, prev); core.solrDelPolicy = this .solrDelPolicy; return core; This would cause problems when trying to get the commit points because the commits from the previous directory would be inherited. In our case, we would start with an empty index at index/ . Then, after a reindex, we would have a new index at index-<timestamp>/ at generation 2 (we do one incremental after a reindex before swapping it in). When the ReplicationHandler tried to find the commits for generation 2 during replication, it couldn't find them. core.getDeletionPolicy().getCommits() would yield the map for index/ , not index-<timestamp>/ . Directory Solr 4.1 has an optimization to use the IndexWriter from the previous SolrCore to create a new DirectoryReader : // use the (old) writer to open the first searcher RefCounted<IndexWriter> iwRef = null ; if (prev != null ) { iwRef = prev.getUpdateHandler().getSolrCoreState().getIndexWriter( null ); if (iwRef != null ) { final IndexWriter iw = iwRef.get(); newReaderCreator = new Callable<DirectoryReader>() { @Override public DirectoryReader call() throws Exception { return DirectoryReader.open(iw, true ); } }; } } In our case, the old core refers to index/ and the core being created is at index-<timestamp>/ , so the DirectoryReader from the old core points to a directory that is no longer in use. After doing the index.properties update and core reload, the newly created SolrIndexSearcher would still point to index/ . It would take an empty commit to get a new reader pointed at index-<timestamp>/ . UpdateHandler The UpdateHandler has a reference to the SolrCoreState / DefaultSolrCoreState which has a reference to the SolrIndexWriter . So, when we inherited the UpdateHandler from index/ after the reload pointed to index-<timestamp>/ , commits would still go to index/ . It's unfortunate to lose all of the stats in DirectUpdateHandler2 , but maybe it makes sense to reset them anyway when the Directory changes? DirectoryFactory I should clarify: the reason for not inheriting DirectoryFactory from the previous core was not actually related to state. Rather, the current logic in 4.1 for whether to inherit the DirectoryFactory is: if (updateHandler == null ) { initDirectoryFactory(); solrCoreState = new DefaultSolrCoreState(getDirectoryFactory()); } else { solrCoreState = updateHandler.getSolrCoreState(); directoryFactory = solrCoreState.getDirectoryFactory(); this .isReloaded = true ; } I removed inheriting of the DirectoryFactory of the previous core because SolrCore#getNewIndexDir() uses this.directoryFactory to read index.properties and getNewIndexDir() was needed to determine whether the new core was using the same directory as the previous core. Perhaps this can be solved by creating a private version of SolrCore#getNewIndexDir that takes a DirectoryFactory as an argument? On second look I don't think I see state in DirectoryFactory that should cause problems when a new directory is used.
          Hide
          Mark Miller added a comment -

          Thanks Gregg - not sure how soon I can get to this stuff, but I'll certainly try to help you address all of this for 4.2.

          Show
          Mark Miller added a comment - Thanks Gregg - not sure how soon I can get to this stuff, but I'll certainly try to help you address all of this for 4.2.
          Hide
          Gregg Donovan added a comment -

          Thanks, Mark – much appreciated.

          Show
          Gregg Donovan added a comment - Thanks, Mark – much appreciated.
          Hide
          Mark Miller added a comment -

          The UpdateHandler has a reference to the SolrCoreState/DefaultSolrCoreState which has a reference to the SolrIndexWriter. So, when we inherited the UpdateHandler from index/ after the reload pointed to index-<timestamp>/, commits would still go to index/.

          That seems strange - we handle that in replication by starting up a new writer when we switch indexes.

          Show
          Mark Miller added a comment - The UpdateHandler has a reference to the SolrCoreState/DefaultSolrCoreState which has a reference to the SolrIndexWriter. So, when we inherited the UpdateHandler from index/ after the reload pointed to index-<timestamp>/, commits would still go to index/. That seems strange - we handle that in replication by starting up a new writer when we switch indexes.
          Hide
          Mark Miller added a comment -

          Can you try this patch and see how close it gets you? We may have to go back and forth once or twice.

          Show
          Mark Miller added a comment - Can you try this patch and see how close it gets you? We may have to go back and forth once or twice.
          Hide
          Mark Miller added a comment -

          Actually, I think you probably need that patch + SOLR-4417

          Show
          Mark Miller added a comment - Actually, I think you probably need that patch + SOLR-4417
          Hide
          Commit Tag Bot added a comment -

          [trunk commit] Mark Robert Miller
          http://svn.apache.org/viewvc?view=revision&revision=1447089

          SOLR-4413: Fix SolrCore#getIndexDir() to return the current index directory.

          SOLR-4469: A new IndexWriter must be opened on SolrCore reload when the index directory has changed and the previous SolrCore's state should not be propagated.

          Show
          Commit Tag Bot added a comment - [trunk commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revision&revision=1447089 SOLR-4413 : Fix SolrCore#getIndexDir() to return the current index directory. SOLR-4469 : A new IndexWriter must be opened on SolrCore reload when the index directory has changed and the previous SolrCore's state should not be propagated.
          Hide
          Commit Tag Bot added a comment -

          [branch_4x commit] Mark Robert Miller
          http://svn.apache.org/viewvc?view=revision&revision=1447090

          SOLR-4413: Fix SolrCore#getIndexDir() to return the current index directory.

          SOLR-4469: A new IndexWriter must be opened on SolrCore reload when the index directory has changed and the previous SolrCore's state should not be propagated.

          Show
          Commit Tag Bot added a comment - [branch_4x commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revision&revision=1447090 SOLR-4413 : Fix SolrCore#getIndexDir() to return the current index directory. SOLR-4469 : A new IndexWriter must be opened on SolrCore reload when the index directory has changed and the previous SolrCore's state should not be propagated.
          Hide
          Mark Miller added a comment -

          I opened SOLR-4469 for part of this. Everything is in now. Let me know where you run into problems.

          Show
          Mark Miller added a comment - I opened SOLR-4469 for part of this. Everything is in now. Let me know where you run into problems.
          Hide
          Amit Nithian added a comment -

          Is there a way to get this into a minor release of 4.1 (4.1.1 or something). I think this is impacting the replication handler because if you have your index currently located in an index.<timestamp> directory, a full download will inadvertently always happen because none of the local files will exist since it's thinking the index dir is "index/".

          Show
          Amit Nithian added a comment - Is there a way to get this into a minor release of 4.1 (4.1.1 or something). I think this is impacting the replication handler because if you have your index currently located in an index.<timestamp> directory, a full download will inadvertently always happen because none of the local files will exist since it's thinking the index dir is "index/".
          Hide
          Shawn Heisey added a comment -

          Amit, as I am not a Lucene/Solr committer, I cannot give you an authoritative answer, but I can offer my best guess. I do not think there will be a 4.1.1 release. There is already a pledge by one committer to get 4.2 released in the next 2-3 weeks. It takes almost as much developer effort for a point release as a new minor version, so the effort usually goes towards the new minor version.

          Show
          Shawn Heisey added a comment - Amit, as I am not a Lucene/Solr committer, I cannot give you an authoritative answer, but I can offer my best guess. I do not think there will be a 4.1.1 release. There is already a pledge by one committer to get 4.2 released in the next 2-3 weeks. It takes almost as much developer effort for a point release as a new minor version, so the effort usually goes towards the new minor version.
          Hide
          Uwe Schindler added a comment -

          Closed after release.

          Show
          Uwe Schindler added a comment - Closed after release.

            People

            • Assignee:
              Mark Miller
              Reporter:
              Gregg Donovan
            • Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development