Solr
  1. Solr
  2. SOLR-622

SpellCheckComponent should build or load indices on startup and commits

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 1.3
    • Fix Version/s: 1.3
    • Component/s: spellchecker
    • Labels:
      None

      Description

      SpellCheckComponent must be able to build/load spell check index on startup and commits. With SOLR-605, SpellCheckComponent can register an event listener for firstSearcher and newSearcher events and rebuild/reload indices as appropriate.

      • If index uses a FSDirectory and exists on disk then just reload it or else build it on firstSearcher event.
      • If index is built from a Solr field then re-build it on newSearcher event.

      This will help avoid having to add QuerySenderListener in configuration and will not force people to issue build on first query.

      All this should be configurable so that people who don't want to rebuild on commits should be able to turn this feature off per configured spell checker.

      1. SOLR-622.patch
        7 kB
        Shalin Shekhar Mangar
      2. SOLR-622.patch
        8 kB
        Shalin Shekhar Mangar
      3. SOLR-622.patch
        11 kB
        Shalin Shekhar Mangar

        Issue Links

          Activity

          Hide
          Shalin Shekhar Mangar added a comment -

          Reloading existing index on firstSearcher cannot work without SOLR-648

          Show
          Shalin Shekhar Mangar added a comment - Reloading existing index on firstSearcher cannot work without SOLR-648
          Hide
          Shalin Shekhar Mangar added a comment -

          I suppose if a reload on firstSearcher fails due to some reason (missing/corrupted index), we should try to build it, right?

          Show
          Shalin Shekhar Mangar added a comment - I suppose if a reload on firstSearcher fails due to some reason (missing/corrupted index), we should try to build it, right?
          Hide
          Shalin Shekhar Mangar added a comment -

          Hmm. The reload OR build logic cannot work correctly with firstSearcher. We cannot rely on the spellCheckIndexDir existing on disk to be a sure sign that only a reload is necessary. It may just be an empty directory/index. In fact the AbstractLuceneSpellChecker creates a FSDirectory on init which means that doing a simple reload may not be enough on startup.

          I think we should always call reload on firstSearcher for all spell checkers and (only for Solr based spell checkers) call build on newSearcher event. The only case where this will be a problem is where an index exists, Solr is stopped, spell checker config is added/enabled and Solr is restarted. Now a reload is done but we do not have any data in the spell check index. If a commit is done or build is called manually, things will take care of themselves.

          Show
          Shalin Shekhar Mangar added a comment - Hmm. The reload OR build logic cannot work correctly with firstSearcher. We cannot rely on the spellCheckIndexDir existing on disk to be a sure sign that only a reload is necessary. It may just be an empty directory/index. In fact the AbstractLuceneSpellChecker creates a FSDirectory on init which means that doing a simple reload may not be enough on startup. I think we should always call reload on firstSearcher for all spell checkers and (only for Solr based spell checkers) call build on newSearcher event. The only case where this will be a problem is where an index exists, Solr is stopped, spell checker config is added/enabled and Solr is restarted. Now a reload is done but we do not have any data in the spell check index. If a commit is done or build is called manually, things will take care of themselves.
          Hide
          Shalin Shekhar Mangar added a comment -

          Issue description edited to add note on enabling/disabling this feature per spell checker through configuration.

          Show
          Shalin Shekhar Mangar added a comment - Issue description edited to add note on enabling/disabling this feature per spell checker through configuration.
          Hide
          Shalin Shekhar Mangar added a comment -

          Adds a firstSearcher and postCommit event listener using API calls. The firstSearcher listener is added for all spell checkers so that reload is called on startup but the postCommit call back is only added if the spellchecker specifies the following in it's configuration:

          <str name="buildOnCommit">true</str>
          

          Adds tests for the same. However, they fail currently. See http://www.nabble.com/Question-on-newSearcher-and-commit-callbacks-tp18667573p18667573.html for details. If the second commit call in the tests is uncommented then the tests pass.

          Show
          Shalin Shekhar Mangar added a comment - Adds a firstSearcher and postCommit event listener using API calls. The firstSearcher listener is added for all spell checkers so that reload is called on startup but the postCommit call back is only added if the spellchecker specifies the following in it's configuration: <str name= "buildOnCommit" > true </str> Adds tests for the same. However, they fail currently. See http://www.nabble.com/Question-on-newSearcher-and-commit-callbacks-tp18667573p18667573.html for details. If the second commit call in the tests is uncommented then the tests pass.
          Hide
          Shalin Shekhar Mangar added a comment -

          This patch uses separate firstSearcher and newSearcher event listeners for each spell checker. All tests pass. I'll commit this shortly.

          We should still try to figure out why the searcher obtained in a postCommit listener does not get access to the latest searcher.

          Show
          Shalin Shekhar Mangar added a comment - This patch uses separate firstSearcher and newSearcher event listeners for each spell checker. All tests pass. I'll commit this shortly. We should still try to figure out why the searcher obtained in a postCommit listener does not get access to the latest searcher.
          Hide
          Yonik Seeley added a comment -

          We should still try to figure out why the searcher obtained in a postCommit listener does not get access to the latest searcher.

          I think the callback happens before the new searcher is opened.
          I think postCommit is also synchronous with the commit process - you don't want to rebuild your spell checking index here because it would freeze out any updates until it's done.

          We have warming mechanism that works for stuff like this... perhaps that is what should be used rather than postCommit callbacks?

          Show
          Yonik Seeley added a comment - We should still try to figure out why the searcher obtained in a postCommit listener does not get access to the latest searcher. I think the callback happens before the new searcher is opened. I think postCommit is also synchronous with the commit process - you don't want to rebuild your spell checking index here because it would freeze out any updates until it's done. We have warming mechanism that works for stuff like this... perhaps that is what should be used rather than postCommit callbacks?
          Hide
          Shalin Shekhar Mangar added a comment -

          I think the callback happens before the new searcher is opened.

          Yes, the searcher is being created after informing the listeners. IMO, that seems a bit counter-intuitive. The commit has happened so I would expect to see the new data.

          We have warming mechanism that works for stuff like this... perhaps that is what should be used rather than postCommit callbacks?

          So the way used in the latest patch using a newSearcher event should be fine, right? It is done in a separate thread through ExecutorService in SolrCore.

          Not relevant to this issue but the newSearcher method is always called with a null currentSearcher, why is that? I mentioned it at http://www.nabble.com/Question-on-newSearcher-and-commit-callbacks-tp18667573p18667573.html

          Show
          Shalin Shekhar Mangar added a comment - I think the callback happens before the new searcher is opened. Yes, the searcher is being created after informing the listeners. IMO, that seems a bit counter-intuitive. The commit has happened so I would expect to see the new data. We have warming mechanism that works for stuff like this... perhaps that is what should be used rather than postCommit callbacks? So the way used in the latest patch using a newSearcher event should be fine, right? It is done in a separate thread through ExecutorService in SolrCore. Not relevant to this issue but the newSearcher method is always called with a null currentSearcher, why is that? I mentioned it at http://www.nabble.com/Question-on-newSearcher-and-commit-callbacks-tp18667573p18667573.html
          Hide
          Yonik Seeley added a comment -

          Yes, the searcher is being created after informing the listeners. IMO, that seems a bit counter-intuitive. The commit has happened so I would expect to see the new data.

          That particular callback was made for index distribution. It needed to be synchronous with the commit process, not allowing anything else to proceed until finished to ensure a clean snapshot. The trigger is closing of the IndexWriter, not opening of a new IndexSearcher.

          We could create new callbacks, but I'm not sure I would want to do that for 1.3

          So the way used in the latest patch using a newSearcher event should be fine, right? It is done in a separate thread through ExecutorService in SolrCore.

          I haven't looked at the patch, but this sounds potentially dangerous if you are using the searcher passed to you (async closes, etc). I've added a comment to the code about this.

          Show
          Yonik Seeley added a comment - Yes, the searcher is being created after informing the listeners. IMO, that seems a bit counter-intuitive. The commit has happened so I would expect to see the new data. That particular callback was made for index distribution. It needed to be synchronous with the commit process, not allowing anything else to proceed until finished to ensure a clean snapshot. The trigger is closing of the IndexWriter, not opening of a new IndexSearcher. We could create new callbacks, but I'm not sure I would want to do that for 1.3 So the way used in the latest patch using a newSearcher event should be fine, right? It is done in a separate thread through ExecutorService in SolrCore. I haven't looked at the patch, but this sounds potentially dangerous if you are using the searcher passed to you (async closes, etc). I've added a comment to the code about this.
          Hide
          Shalin Shekhar Mangar added a comment -

          I haven't looked at the patch, but this sounds potentially dangerous if you are using the searcher passed to you (async closes, etc). I've added a comment to the code about this.

          The spell checker build is synchronous and completed inside the newSearcher call and the reference to searcher is not kept anywhere, we should be fine. In addition, I'll a note to the SolrSpellChecker#build method javadocs that no references to the searcher should be kept by implementations.

          Actually a reference to searcher is kept in the dictionary but the dictionary is used immediately and created everytime in the build method. We can get rid of it easily.

          Show
          Shalin Shekhar Mangar added a comment - I haven't looked at the patch, but this sounds potentially dangerous if you are using the searcher passed to you (async closes, etc). I've added a comment to the code about this. The spell checker build is synchronous and completed inside the newSearcher call and the reference to searcher is not kept anywhere, we should be fine. In addition, I'll a note to the SolrSpellChecker#build method javadocs that no references to the searcher should be kept by implementations. Actually a reference to searcher is kept in the dictionary but the dictionary is used immediately and created everytime in the build method. We can get rid of it easily.
          Hide
          Yonik Seeley added a comment -

          It is done in a separate thread through ExecutorService in SolrCore.

          I may have mis-understood this part... I though you were using your own ExecutorService.
          So yes, using newSearcher and firstSearcher events should be fine. They will be synchronous though (block all other progress on warming and registering the new server). That may or may not be what you want.

          Show
          Yonik Seeley added a comment - It is done in a separate thread through ExecutorService in SolrCore. I may have mis-understood this part... I though you were using your own ExecutorService. So yes, using newSearcher and firstSearcher events should be fine. They will be synchronous though (block all other progress on warming and registering the new server). That may or may not be what you want.
          Hide
          Shalin Shekhar Mangar added a comment -

          Functionally, this patch aspires to eliminate the need to modify solrconfig.xml to add a reload call to firstSearcher and build call to newSearcher using the QuerySenderListener.

          They will be synchronous though (block all other progress on warming and registering the new server). That may or may not be what you want.

          Until we can add a scheduler into Solr in the future (which is something a lot of components like replication and DataImportHandler may need), I really don't see any other way for now.

          Show
          Shalin Shekhar Mangar added a comment - Functionally, this patch aspires to eliminate the need to modify solrconfig.xml to add a reload call to firstSearcher and build call to newSearcher using the QuerySenderListener. They will be synchronous though (block all other progress on warming and registering the new server). That may or may not be what you want. Until we can add a scheduler into Solr in the future (which is something a lot of components like replication and DataImportHandler may need), I really don't see any other way for now.
          Hide
          Shalin Shekhar Mangar added a comment -

          The last patch had some extra code which I was using for testing. This one removes all of those.

          I'll commit this shortly.

          Show
          Shalin Shekhar Mangar added a comment - The last patch had some extra code which I was using for testing. This one removes all of those. I'll commit this shortly.
          Hide
          Shalin Shekhar Mangar added a comment -

          The last patch was generated incorrectly. Uploading new one.

          We are introducing a configuration parameter for spell checker with this issue

          <str name="buildOnCommit">true</str>
          

          Now there is no need to check if a spell checker is built from Solr field or not. If the configuration param is present, we can blindly build it on newSearcher event.

          Show
          Shalin Shekhar Mangar added a comment - The last patch was generated incorrectly. Uploading new one. We are introducing a configuration parameter for spell checker with this issue <str name= "buildOnCommit" > true </str> Now there is no need to check if a spell checker is built from Solr field or not. If the configuration param is present, we can blindly build it on newSearcher event.
          Hide
          Shalin Shekhar Mangar added a comment -

          Committed revision 681604.

          Show
          Shalin Shekhar Mangar added a comment - Committed revision 681604.

            People

            • Assignee:
              Shalin Shekhar Mangar
              Reporter:
              Shalin Shekhar Mangar
            • Votes:
              2 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development