Solr
  1. Solr
  2. SOLR-7888

Make Lucene's AnalyzingInfixSuggester.lookup() method that takes a BooleanQuery filter parameter available in Solr

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 5.2.1
    • Fix Version/s: 5.4, 6.0
    • Component/s: Suggester
    • Labels:
      None

      Description

      LUCENE-6464 has introduced a very flexible lookup method that takes as parameter a BooleanQuery that is used for filtering results.

      This ticket is to expose that method to Solr.

      This would allow user to do:

      /suggest?suggest=true&suggest.build=true&suggest.q=term&suggest.contextFilterQuery=contexts:tennis
      
      /suggest?suggest=true&suggest.build=true&suggest.q=term&suggest.contextFilterQuery=contexts:golf AND contexts:football
      

      etc

      Given that the context filtering in currently only implemented by the

      AnalyzingInfixSuggester

      and by the

      BlendedInfixSuggester

      , this initial implementation will support only these 2 lookup implementations.

      1. SOLR-7888.patch
        37 kB
        Arcadius Ahouansou
      2. SOLR-7888.patch
        38 kB
        Jan Høydahl
      3. SOLR-7888-7963.patch
        52 kB
        Arcadius Ahouansou

        Issue Links

          Activity

          Hide
          Arcadius Ahouansou added a comment -

          Initial patch with a few todo:

          • Fix highlighting
          • merge tests after SOLR-6246 is fixed
          • ..
          Show
          Arcadius Ahouansou added a comment - Initial patch with a few todo: Fix highlighting merge tests after SOLR-6246 is fixed ..
          Hide
          Arcadius Ahouansou added a comment -

          Please Michael McCandless, when you have the chance, could you have a look and let me know whether all this makes any sense?
          Thanks.
          Arcadius

          Show
          Arcadius Ahouansou added a comment - Please Michael McCandless , when you have the chance, could you have a look and let me know whether all this makes any sense? Thanks. Arcadius
          Hide
          Michael McCandless added a comment -

          Hi Arcadius Ahouansou, I agree we should add this to Solr, but I'm not familiar enough with this part of Solr to help here ...

          Show
          Michael McCandless added a comment - Hi Arcadius Ahouansou , I agree we should add this to Solr, but I'm not familiar enough with this part of Solr to help here ...
          Hide
          Jan Høydahl added a comment -

          Thanks for the contribution, looks promising. Just browsed the code, have not tested or looked thoroughly. What analysis is applied to the new contextfield? Do we need it to be configurable?

          I can probably help next week if I find some time...

          Show
          Jan Høydahl added a comment - Thanks for the contribution, looks promising. Just browsed the code, have not tested or looked thoroughly. What analysis is applied to the new contextfield? Do we need it to be configurable? I can probably help next week if I find some time...
          Hide
          Arcadius Ahouansou added a comment -
          Show
          Arcadius Ahouansou added a comment - Thank you Michael McCandless
          Hide
          Arcadius Ahouansou added a comment -

          Hello Jan Høydahl

          The contextField used in the tests is defined as a simple string in the schema.
          We could make things more flexible by defining the type of the field ... similar to the suggestAnalyzerFieldType config.

          I have added this to my todo list.

          Is it OK to have the basic usable functionality out and then, create more tickets to implement further enhancements?

          Thank you very much.

          Arcadius.

          Show
          Arcadius Ahouansou added a comment - Hello Jan Høydahl The contextField used in the tests is defined as a simple string in the schema. We could make things more flexible by defining the type of the field ... similar to the suggestAnalyzerFieldType config. I have added this to my todo list. Is it OK to have the basic usable functionality out and then, create more tickets to implement further enhancements? Thank you very much. Arcadius.
          Hide
          Jan Høydahl added a comment -

          Let's see if it's a an easy one by copying existing code, if not we can do it step wise. Should probably handle multi valued input too..

          Show
          Jan Høydahl added a comment - Let's see if it's a an easy one by copying existing code, if not we can do it step wise. Should probably handle multi valued input too..
          Hide
          Arcadius Ahouansou added a comment - - edited

          Hi Jan Høydahl.
          I will update the patch this weekend and add a

          contextAnalyzerFieldType

          configuration with tests. Should be quite easy.

          By multi-input, you meant

          suggest.contextFilterQuery=contexts:golf&suggest.contextFilterQuery=contexts:football
          

          I will add that as well this weekend.

          In the meanwhile, let me know if there is anything else that I need to add.

          Thank you very much for picking up this.

          Arcadius.

          Show
          Arcadius Ahouansou added a comment - - edited Hi Jan Høydahl . I will update the patch this weekend and add a contextAnalyzerFieldType configuration with tests. Should be quite easy. By multi-input, you meant suggest.contextFilterQuery=contexts:golf&suggest.contextFilterQuery=contexts:football I will add that as well this weekend. In the meanwhile, let me know if there is anything else that I need to add. Thank you very much for picking up this. Arcadius.
          Hide
          Jan Høydahl added a comment -

          By multivalue I mean that the Solr source field could be MV and all values should match against the query. You only need one &suggest.contextFilterQuery=. Is it possible to shorten the param name, to e.g. &suggest.contextQ= or something? And if you only ever can query on "context" (? ), then perhaps that can be defaultField, and the query itself can be simply golf OR football?

          Show
          Jan Høydahl added a comment - By multivalue I mean that the Solr source field could be MV and all values should match against the query. You only need one &suggest.contextFilterQuery= . Is it possible to shorten the param name, to e.g. &suggest.contextQ= or something? And if you only ever can query on "context" (? ), then perhaps that can be defaultField, and the query itself can be simply golf OR football ?
          Hide
          Arcadius Ahouansou added a comment - - edited

          Hello Jan Høydahl

          • The defaultField thing is an excellent idea! Will add that
          • I will change the param to suggest.contextQ
          • About the contextAnalyzerFieldType config, it seems we may not need that because as we know the contextField, I was able to do something like
            Analyzer queryAnalyzer = core.getLatestSchema().getFieldType(contextsField).getQueryAnalyzer()

          About multivalue: the source field in the current test is defined as

          <field name="contexts" type="text" indexed="true" stored="true" multiValued="true"/>

          and we have tests where we index data with multiple contexts

              assertU(adoc("id", "8", "cat", "example inputdata", "price", "45", "weight", "30", "contexts", "ctx2", "contexts", "ctx3"));
          

          and we check both contexts with an AND

              //AND BooleanQuery
              assertQ(req("qt", rh,
                      SuggesterParams.SUGGEST_BUILD, "true",
                      SuggesterParams.SUGGEST_DICT, "suggest_blended_infix_suggester",
                      SuggesterParams.SUGGEST_CONTEXT_FILER_QUERY, "contexts:ctx2 AND contexts:ctx3",
                      SuggesterParams.SUGGEST_Q, "examp",
                      SuggesterParams.SUGGEST_COUNT, "5"),
                  "//lst[@name='suggest']/lst[@name='suggest_blended_infix_suggester']/lst[@name='examp']/int[@name='numFound'][.='1']"
              );
          

          I hope this satisfies your requirement?

          Show
          Arcadius Ahouansou added a comment - - edited Hello Jan Høydahl The defaultField thing is an excellent idea! Will add that I will change the param to suggest.contextQ About the contextAnalyzerFieldType config, it seems we may not need that because as we know the contextField , I was able to do something like Analyzer queryAnalyzer = core.getLatestSchema().getFieldType(contextsField).getQueryAnalyzer() About multivalue: the source field in the current test is defined as <field name= "contexts" type= "text" indexed= " true " stored= " true " multiValued= " true " /> and we have tests where we index data with multiple contexts assertU(adoc( "id" , "8" , "cat" , "example inputdata" , "price" , "45" , "weight" , "30" , "contexts" , "ctx2" , "contexts" , "ctx3" )); and we check both contexts with an AND //AND BooleanQuery assertQ(req( "qt" , rh, SuggesterParams.SUGGEST_BUILD, " true " , SuggesterParams.SUGGEST_DICT, "suggest_blended_infix_suggester" , SuggesterParams.SUGGEST_CONTEXT_FILER_QUERY, "contexts:ctx2 AND contexts:ctx3" , SuggesterParams.SUGGEST_Q, "examp" , SuggesterParams.SUGGEST_COUNT, "5" ), " //lst[@name='suggest']/lst[@name='suggest_blended_infix_suggester']/lst[@name='examp']/ int [@name='numFound'][.='1']" ); I hope this satisfies your requirement?
          Hide
          Jan Høydahl added a comment -

          This all starts to look good. Auto Analyzer choice for ctx field was a smart move, but should still be overridable I guess.

          Show
          Jan Høydahl added a comment - This all starts to look good. Auto Analyzer choice for ctx field was a smart move, but should still be overridable I guess.
          Hide
          Arcadius Ahouansou added a comment -

          Hi Jan Høydahl.
          I am away for a couple of days but will upload the updated patch ASAP. ..
          Thanks.

          Show
          Arcadius Ahouansou added a comment - Hi Jan Høydahl . I am away for a couple of days but will upload the updated patch ASAP. .. Thanks.
          Hide
          Jan Høydahl added a comment -

          Will let others chime in on the param names too. Which one do you like the best?

          • suggest.contextFilterQuery
          • suggest.contextQ
          • suggest.fq
          • suggest.context.fq

          The suggest.fq variant would be in line with filters for main query, but does not indicate that it is for context field only, so people could misunderstand and try filter for other fields.

          Another issue is this line in the patch

          throw new SolrException(SolrException.ErrorCode.BAD_REQUEST,"Context Filtering Query not yet supported by "+lookup.getClass());
          

          Solr supports running multiple suggesters in parallel on one call. Should we require that if you want to use the context filtering, all suggesters must support it (like in current patch) or should we fallback to not filtering for the other suggesters. I guess there are use cases that would want both behaviors...

          Show
          Jan Høydahl added a comment - Will let others chime in on the param names too. Which one do you like the best? suggest.contextFilterQuery suggest.contextQ suggest.fq suggest.context.fq The suggest.fq variant would be in line with filters for main query, but does not indicate that it is for context field only, so people could misunderstand and try filter for other fields. Another issue is this line in the patch throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, "Context Filtering Query not yet supported by " +lookup.getClass()); Solr supports running multiple suggesters in parallel on one call. Should we require that if you want to use the context filtering, all suggesters must support it (like in current patch) or should we fallback to not filtering for the other suggesters. I guess there are use cases that would want both behaviors...
          Hide
          Arcadius Ahouansou added a comment - - edited

          Hello Jan Høydahl.
          Sorry I have been away for quite a while.

          I have just uploaded an updated version of the patch.
          Changes dones are as follow:

          • No need to pass the query field anymore. the internal AnalyzingInfixSuggester.CONTEXTS_FIELD_NAME is used as query field i.e. the recommended way to filter is just "ctx2 AND ctx3" instead of the old way "contexts:ctx2 AND contexts:ctx3"
          • Most of the logic for parsing queries has been moved from SuggestComponent.java into SolrSuggester.java
          • Multiple suggester can be configured, each one having a different analyzer for the context field as the query parsing in now done in the SolrSuggester.
          • The contextAnalyzerFieldType config is optional and by default, the analyzer of the context field will be used
          • There is a test testContextFilterParamIsIgnoredWhenContextIsNotImplemented() to test that if you send context filtering query to a suggester that does not support context, the filtering is just ignored and suggest return result.
            Along the same line, there is also a test testContextFilteringIsIgnoredWhenContextIsImplementedButNotConfigured()
            So, the implementation is more friendly now.
          • Suggester build will throw exception only if you configure context in solrconfig.xml on a suggester that does not support context. But queries still return normally, just the build operation for the concerned suggester fails.
            There is a test testBuildThrowsIllegalArgumentExceptionWhenContextIsConfiguredButNotImplemented() to cover that
          • buildAll will fail for all (not just the concerned suggester) if you configure context in solrconfig.xml on a suggester that does not support context
          • Regarding the parameter name. The initial implementation if did used the name suggest.cfq for ContextFilterQuery. Then I looked at suggest.dictionary, suggest.reloadAll etc which are plain English. For now, I have not yet changed the name. I will change it as soon as we agreed on that.

          Please let me know in case I have missed anything you mentioned.

          Thank you very much.

          Arcadius

          Show
          Arcadius Ahouansou added a comment - - edited Hello Jan Høydahl . Sorry I have been away for quite a while. I have just uploaded an updated version of the patch. Changes dones are as follow: No need to pass the query field anymore. the internal AnalyzingInfixSuggester.CONTEXTS_FIELD_NAME is used as query field i.e. the recommended way to filter is just "ctx2 AND ctx3" instead of the old way "contexts:ctx2 AND contexts:ctx3" Most of the logic for parsing queries has been moved from SuggestComponent.java into SolrSuggester.java Multiple suggester can be configured, each one having a different analyzer for the context field as the query parsing in now done in the SolrSuggester. The contextAnalyzerFieldType config is optional and by default, the analyzer of the context field will be used There is a test testContextFilterParamIsIgnoredWhenContextIsNotImplemented() to test that if you send context filtering query to a suggester that does not support context, the filtering is just ignored and suggest return result. Along the same line, there is also a test testContextFilteringIsIgnoredWhenContextIsImplementedButNotConfigured() So, the implementation is more friendly now. Suggester build will throw exception only if you configure context in solrconfig.xml on a suggester that does not support context. But queries still return normally, just the build operation for the concerned suggester fails. There is a test testBuildThrowsIllegalArgumentExceptionWhenContextIsConfiguredButNotImplemented() to cover that buildAll will fail for all (not just the concerned suggester) if you configure context in solrconfig.xml on a suggester that does not support context Regarding the parameter name. The initial implementation if did used the name suggest.cfq for ContextFilterQuery. Then I looked at suggest.dictionary , suggest.reloadAll etc which are plain English. For now, I have not yet changed the name. I will change it as soon as we agreed on that. Please let me know in case I have missed anything you mentioned. Thank you very much. Arcadius
          Hide
          Jan Høydahl added a comment -

          Somehow, it feels odd to have suggester-agnostic code in relying on SolrSuggester relying on AnalyzingInfixSuggester.CONTEXTS_FIELD_NAME, even if this property is used also by BlendedInfixSuggester. Perhaps this property should be moved to some other Lucene class as a common global name for context field for all analyzers that supports context filtering, even if they don't subclass any of the existing ones? Perhaps the "Lucene guys" see a broader picture here? Hard to know what comes in the future...

          Regarding a request including suggesters that do not support filtering, I think it depends on its data whether the correct thing is to return an unfiltered response (open data) or to return nothing (sensitive data). Of course, the application has the power to pass suggest.dictionary accordingly if it knows that filtering is done. Alternatively, some suggest.returnUnFilteredSuggestionsIfFilteringIsNotSupported param to control it, I don't know...

          In SolrSuggester#initContextFilterQueryAnalyzer, I think that if CONTEXT_ANALYZER_FIELD_TYPE is explicitly given and wrong, we should fail-fast and throw exception instead of falling back to DocumentDictionaryFactory.CONTEXT_FIELD

          Show
          Jan Høydahl added a comment - Somehow, it feels odd to have suggester-agnostic code in relying on SolrSuggester relying on AnalyzingInfixSuggester.CONTEXTS_FIELD_NAME , even if this property is used also by BlendedInfixSuggester . Perhaps this property should be moved to some other Lucene class as a common global name for context field for all analyzers that supports context filtering, even if they don't subclass any of the existing ones? Perhaps the "Lucene guys" see a broader picture here? Hard to know what comes in the future... Regarding a request including suggesters that do not support filtering, I think it depends on its data whether the correct thing is to return an unfiltered response (open data) or to return nothing (sensitive data). Of course, the application has the power to pass suggest.dictionary accordingly if it knows that filtering is done. Alternatively, some suggest.returnUnFilteredSuggestionsIfFilteringIsNotSupported param to control it, I don't know... In SolrSuggester#initContextFilterQueryAnalyzer , I think that if CONTEXT_ANALYZER_FIELD_TYPE is explicitly given and wrong, we should fail-fast and throw exception instead of falling back to DocumentDictionaryFactory.CONTEXT_FIELD
          Hide
          Arcadius Ahouansou added a comment - - edited

          Hello Jan Høydahl

          Thank you very much for your comments.

          Have uploaded new version of the patch.

          Perhaps this property should be moved to some other Lucene class as a common global name for context field for all analyzers that supports context filtering,...

          I agree and I moved CONTEXTS_FIELD_NAME into Lucene's Lookup.java, meaning that it is now available to all Lookup implemetations.

          Regarding a request including suggesters that do not support filtering, I think it depends on its data whether the correct thing is to return an unfiltered response (open data) or to return nothing (sensitive data). Of course, the application has the power to pass suggest.dictionary accordingly if it knows that filtering is done. Alternatively, some suggest.returnUnFilteredSuggestionsIfFilteringIsNotSupported param to control it, I don't know...

          Not quite sure about this:
          Let's take the current solr 5.2.1:
          passing suggest.q=term&suggest.contestFilterQuery=ctx1 will return all suggestions matching the term ignoring ctx1 as context filtering is not yet implemented.

          I believe that keeping that behaviour for Lucene Suggesters that have not yet implemented context makes more sense to me.

          In case a user need context filtering to happen on a Lucene suggester not yet supporting filtering, they just need to implement it.

          Ideally and eventually, we will have context support for all Lucene suggesters, so I am not quite sure whether suggest.returnUnFilteredSuggestionsIfFilteringIsNotSupported is the way we should go.


          I think that if CONTEXT_ANALYZER_FIELD_TYPE is explicitly given and wrong, we should fail-fast and throw exception instead of falling back to DocumentDictionaryFactory.CONTEXT_FIELD

          I had thought a bit more about this.
          I believe that we do not really need the CONTEXT_ANALYZER_FIELD_TYPE config.
          One just needs to configure the context field in schema.xml with the needed query and index analyzers and all should work.
          In case one needs different context analyzers for different suggesters, we just need to configure different context fields in schema.xml.
          This has several advantages:

          • Simpler/less configuration.
          • Cleaner/more readable/less code to maintain.

          In case I am missing any use-case, please let me know

          Will let others chime in on the param names too. Which one do you like the best?
          suggest.contextFilterQuery
          suggest.contextQ
          suggest.fq
          suggest.context.fq

          The param name is this latest patch is still suggest.contextFilterQuery as we have not agreed yet on the right name to adopt.
          Maybe Robert Muir or Shalin Shekhar Mangar or Varun Thacker could help here

          Show
          Arcadius Ahouansou added a comment - - edited Hello Jan Høydahl Thank you very much for your comments. Have uploaded new version of the patch. Perhaps this property should be moved to some other Lucene class as a common global name for context field for all analyzers that supports context filtering,... I agree and I moved CONTEXTS_FIELD_NAME into Lucene's Lookup.java , meaning that it is now available to all Lookup implemetations. Regarding a request including suggesters that do not support filtering, I think it depends on its data whether the correct thing is to return an unfiltered response (open data) or to return nothing (sensitive data). Of course, the application has the power to pass suggest.dictionary accordingly if it knows that filtering is done. Alternatively, some suggest.returnUnFilteredSuggestionsIfFilteringIsNotSupported param to control it, I don't know... Not quite sure about this: Let's take the current solr 5.2.1: passing suggest.q=term&suggest.contestFilterQuery=ctx1 will return all suggestions matching the term ignoring ctx1 as context filtering is not yet implemented. I believe that keeping that behaviour for Lucene Suggesters that have not yet implemented context makes more sense to me. In case a user need context filtering to happen on a Lucene suggester not yet supporting filtering, they just need to implement it. Ideally and eventually, we will have context support for all Lucene suggesters, so I am not quite sure whether suggest.returnUnFilteredSuggestionsIfFilteringIsNotSupported is the way we should go. I think that if CONTEXT_ANALYZER_FIELD_TYPE is explicitly given and wrong, we should fail-fast and throw exception instead of falling back to DocumentDictionaryFactory.CONTEXT_FIELD I had thought a bit more about this. I believe that we do not really need the CONTEXT_ANALYZER_FIELD_TYPE config. One just needs to configure the context field in schema.xml with the needed query and index analyzers and all should work. In case one needs different context analyzers for different suggesters, we just need to configure different context fields in schema.xml . This has several advantages: Simpler/less configuration. Cleaner/more readable/less code to maintain. In case I am missing any use-case, please let me know Will let others chime in on the param names too. Which one do you like the best? suggest.contextFilterQuery suggest.contextQ suggest.fq suggest.context.fq The param name is this latest patch is still suggest.contextFilterQuery as we have not agreed yet on the right name to adopt. Maybe Robert Muir or Shalin Shekhar Mangar or Varun Thacker could help here
          Hide
          Jan Høydahl added a comment - - edited

          Lucene guys (Michael McCandless et.al) are you happy with moving the constant from AnalyzingInfixSuggester to Lookup and make it public, i.e:?

          Index: lucene/suggest/src/java/org/apache/lucene/search/suggest/analyzing/AnalyzingInfixSuggester.java
          -  protected final static String CONTEXTS_FIELD_NAME = "contexts";
          Index: lucene/suggest/src/java/org/apache/lucene/search/suggest/Lookup.java
          +  public static final String CONTEXTS_FIELD_NAME = "contexts";
          

          If you don't want to make the constant part of the public Lucene API, an alternative is to duplicate the constant into org.apache.solr.spelling.suggest.LookupFactory for Solr's use case.

          The param name is this latest patch is still suggest.contextFilterQuery as we have not agreed yet on the right name to adopt

          Perhaps suggest.cfq?

          CONTEXT_ANALYZER_FIELD_TYPE

          I'm ok to delay such a param until there is a use case for it.

          Another question is choice of query parser for the context filter query: StandardQueryParser. Does it make sense to parse the context query using Solr's LuceneQParser, to fully support e.g. localParams, or is this just confusing here since we're not querying a Solr index? A use case would be ACLs, where the filter query could be hundreds of ACL terms, where you would be better off doing something like this:

          &suggest.cfq={!terms f=contexts}abc,def,ghi
          
          Show
          Jan Høydahl added a comment - - edited Lucene guys ( Michael McCandless et.al) are you happy with moving the constant from AnalyzingInfixSuggester to Lookup and make it public, i.e:? Index: lucene/suggest/src/java/org/apache/lucene/search/suggest/analyzing/AnalyzingInfixSuggester.java - protected final static String CONTEXTS_FIELD_NAME = "contexts" ; Index: lucene/suggest/src/java/org/apache/lucene/search/suggest/Lookup.java + public static final String CONTEXTS_FIELD_NAME = "contexts" ; If you don't want to make the constant part of the public Lucene API, an alternative is to duplicate the constant into org.apache.solr.spelling.suggest.LookupFactory for Solr's use case. The param name is this latest patch is still suggest.contextFilterQuery as we have not agreed yet on the right name to adopt Perhaps suggest.cfq ? CONTEXT_ANALYZER_FIELD_TYPE I'm ok to delay such a param until there is a use case for it. Another question is choice of query parser for the context filter query: StandardQueryParser . Does it make sense to parse the context query using Solr's LuceneQParser , to fully support e.g. localParams , or is this just confusing here since we're not querying a Solr index? A use case would be ACLs, where the filter query could be hundreds of ACL terms, where you would be better off doing something like this: &suggest.cfq={!terms f=contexts}abc,def,ghi
          Hide
          Arcadius Ahouansou added a comment - - edited

          Hello Jan Høydahl

          • I have changed the param name to suggest.cfq
          • the constant CONTEXTS_FIELD_NAME is still in Lookup.java awaiting confirmation from Michael McCandless
          • Regarding the local params
            {!terms f=contexts}abc,def,ghi

            I had a look and the local parameter parsing happens in QParser.getParser(String qstr, String defaultParser, SolrQueryRequest req).
            The current query parser framework relies on the solr schema for the query field and the analyzer.
            In our case, the analyzer is in the schema but the query/filter field is not.
            I have started implementing this and some reworking is required to reuse the existing QParser/QParserPlugin
            Note that in the specified case, TermsQParserPlugin generates a ConstantScoreQuery. We will have to wrap this into a BooleanQuery to send it to Lucene as the context filtering supports only BooleanQuery at the moment.
            I would suggest we add the local param feature as an enhancement later if that is OK.

          Please let me know if there is any other change to be done.

          Thank you.

          Arcadius

          Show
          Arcadius Ahouansou added a comment - - edited Hello Jan Høydahl I have changed the param name to suggest.cfq the constant CONTEXTS_FIELD_NAME is still in Lookup.java awaiting confirmation from Michael McCandless Regarding the local params {!terms f=contexts}abc,def,ghi I had a look and the local parameter parsing happens in QParser.getParser(String qstr, String defaultParser, SolrQueryRequest req) . The current query parser framework relies on the solr schema for the query field and the analyzer. In our case, the analyzer is in the schema but the query/filter field is not. I have started implementing this and some reworking is required to reuse the existing QParser/QParserPlugin Note that in the specified case, TermsQParserPlugin generates a ConstantScoreQuery. We will have to wrap this into a BooleanQuery to send it to Lucene as the context filtering supports only BooleanQuery at the moment. I would suggest we add the local param feature as an enhancement later if that is OK. Please let me know if there is any other change to be done. Thank you. Arcadius
          Hide
          Jan Høydahl added a comment -

          Sure, localparams can be a followup

          Show
          Jan Høydahl added a comment - Sure, localparams can be a followup
          Hide
          Arcadius Ahouansou added a comment -

          Adding in the test a reference to SOLR-7963 and SOLR-7964

          Show
          Arcadius Ahouansou added a comment - Adding in the test a reference to SOLR-7963 and SOLR-7964
          Hide
          Jan Høydahl added a comment -

          I think this is close to committable. If there are no objections to moving CONTEXTS_FIELD_NAME to Lookup.java by tomorrow, I'll do a round of final reviews & manual testing before committing.

          Show
          Jan Høydahl added a comment - I think this is close to committable. If there are no objections to moving CONTEXTS_FIELD_NAME to Lookup.java by tomorrow, I'll do a round of final reviews & manual testing before committing.
          Hide
          Arcadius Ahouansou added a comment -

          Updated from trunk and added testContextFilterIsTrimmed()

          Show
          Arcadius Ahouansou added a comment - Updated from trunk and added testContextFilterIsTrimmed()
          Hide
          Arcadius Ahouansou added a comment -

          Is there anything (maybe add more tests or do some specified manual testing ) I could do to help get this through ?
          Thank you very much.

          Arcadius.

          Show
          Arcadius Ahouansou added a comment - Is there anything (maybe add more tests or do some specified manual testing ) I could do to help get this through ? Thank you very much. Arcadius.
          Hide
          David Smiley added a comment -

          Looks cool. IMO Can't we just incorporate SOLR-7963 (support local-params) now? Solr makes supporting this easy; like a one-liner I think.

          Personally I like the "suggest.fq" name best, even considering your rationale as to why not. No big deal though.

          Show
          David Smiley added a comment - Looks cool. IMO Can't we just incorporate SOLR-7963 (support local-params) now? Solr makes supporting this easy; like a one-liner I think. Personally I like the "suggest.fq" name best, even considering your rationale as to why not. No big deal though.
          Hide
          Arcadius Ahouansou added a comment - - edited

          Hello David Smiley
          Thank you very much for your comment.

          • I could change the parameter to suggest.fq if Jan Høydahl and others are OK with that. For now, filtering is happening only on the context field, that is why cfq was chosen. But maybe in the future, we may have filtering on any arbitrary field in which case suggest.fq may make more sense? Are you OK Jan Høydahl for me to change to fq?
          • About SOLR-7963 , yes it could be included here. From first glance, it looks like a 1-liner, but it's not. Please look at my comment above. Or maybe am I missing something? Any contribution would be well appreciated. Note that there is an ignored failing test testLocalTermsQuery() for SOLR-7963 in the current patch
            I found it easier and more manageable to have it as an additional feature in a separate ticket. I wanted to start working on that as soon as this one is committed so that they could go into the very same release.
            I would be more than happy if David Smiley wants to help with SOLR-7963
            Thanks.
          Show
          Arcadius Ahouansou added a comment - - edited Hello David Smiley Thank you very much for your comment. I could change the parameter to suggest.fq if Jan Høydahl and others are OK with that. For now, filtering is happening only on the context field, that is why cfq was chosen. But maybe in the future, we may have filtering on any arbitrary field in which case suggest.fq may make more sense? Are you OK Jan Høydahl for me to change to fq? About SOLR-7963 , yes it could be included here. From first glance, it looks like a 1-liner, but it's not. Please look at my comment above. Or maybe am I missing something? Any contribution would be well appreciated. Note that there is an ignored failing test testLocalTermsQuery() for SOLR-7963 in the current patch I found it easier and more manageable to have it as an additional feature in a separate ticket. I wanted to start working on that as soon as this one is committed so that they could go into the very same release. I would be more than happy if David Smiley wants to help with SOLR-7963 Thanks.
          Hide
          Arcadius Ahouansou added a comment -

          I have just started work on SOLR-7963 as part of SOLR-7888. I will upload a new patch here when done.
          Thanks.

          Show
          Arcadius Ahouansou added a comment - I have just started work on SOLR-7963 as part of SOLR-7888 . I will upload a new patch here when done. Thanks.
          Hide
          Arcadius Ahouansou added a comment -
          • Patch now includes both SOLR-7888 and SOLR-7963.
          • The parameter has been changed to suggest.fq
          Show
          Arcadius Ahouansou added a comment - Patch now includes both SOLR-7888 and SOLR-7963 . The parameter has been changed to suggest.fq
          Hide
          Jan Høydahl added a comment -

          Please don't delete attachments when uploading new ones.
          I think it makes sense to commit a version without localParams support first, as there are still some unresolved issues with that integration:

          • Solr's Qparsers assume that you query the index specified in schema.xml, but we don't
          • It is kind of a hack to force Lucene's AnalyzingSuggester to use same contexts field name as the source schema field name we pull data from - it satisfies QParser's need for a DF which exists in schema, but there are more problems:
          • If the source fieldType in schema.xml is e.g. text, then that Analyser is used for query, with lowercasing etc. Problem is that the contexts field for the Suggester is always indexed as String, meaning that a source string "ABC" will not match a query "ABC" since it will be lowercased and match only "abc"

          One solution is to extend Lucene's suggesters to be able to index contexts field with a custom analyzer, given in constructor. Then we could match things up and get it working. However, I don't like the hack of accidentally naming the two fields the same to get QParser working, so ideally we should then create a SuggesterQParser or something which accepts DF not in schema and is explicit about Analysers. But then letting people switch parser with localParam will bring them trouble again since that parser will require the field to exist in schema etc...

          So for now let's analyze the context query as String, using Lucene's query parser, and leave to a future jira to add more flexibility. I'll upload a patch shortly.

          Show
          Jan Høydahl added a comment - Please don't delete attachments when uploading new ones. I think it makes sense to commit a version without localParams support first, as there are still some unresolved issues with that integration: Solr's Qparsers assume that you query the index specified in schema.xml, but we don't It is kind of a hack to force Lucene's AnalyzingSuggester to use same contexts field name as the source schema field name we pull data from - it satisfies QParser's need for a DF which exists in schema, but there are more problems: If the source fieldType in schema.xml is e.g. text , then that Analyser is used for query, with lowercasing etc. Problem is that the contexts field for the Suggester is always indexed as String , meaning that a source string "ABC" will not match a query "ABC" since it will be lowercased and match only "abc" One solution is to extend Lucene's suggesters to be able to index contexts field with a custom analyzer, given in constructor. Then we could match things up and get it working. However, I don't like the hack of accidentally naming the two fields the same to get QParser working, so ideally we should then create a SuggesterQParser or something which accepts DF not in schema and is explicit about Analysers. But then letting people switch parser with localParam will bring them trouble again since that parser will require the field to exist in schema etc... So for now let's analyze the context query as String, using Lucene's query parser, and leave to a future jira to add more flexibility. I'll upload a patch shortly.
          Hide
          Jan Høydahl added a comment -

          Some details on this patch:

          • Changed back to suggest.cfq
          • Avoid test-schema changes, use my_contexts_t and my_contexts_s instead
          • Skip (for now) the configurable contexts field name in Lucene classes
          • Add the CONTEXTS_FIELD_NAME constant to AnalyzingInfixLookupFactory to avoid making the Lucene constant public
          • Now we don't get Solr error code when searching with invalid field names, so I changed the test to instead verify that we get 0 hits.

          Query analyser is now always created as

          contextFilterQueryAnalyzer = new TokenizerChain(new StandardTokenizerFactory(Collections.EMPTY_MAP), null);

          It seems to work well, however not sure if there are use cases where it don't align with the index-analyzer of the Suggester index.

          Show
          Jan Høydahl added a comment - Some details on this patch: Changed back to suggest.cfq Avoid test-schema changes, use my_contexts_t and my_contexts_s instead Skip (for now) the configurable contexts field name in Lucene classes Add the CONTEXTS_FIELD_NAME constant to AnalyzingInfixLookupFactory to avoid making the Lucene constant public Now we don't get Solr error code when searching with invalid field names, so I changed the test to instead verify that we get 0 hits. Query analyser is now always created as contextFilterQueryAnalyzer = new TokenizerChain( new StandardTokenizerFactory(Collections.EMPTY_MAP), null ); It seems to work well, however not sure if there are use cases where it don't align with the index-analyzer of the Suggester index.
          Hide
          Arcadius Ahouansou added a comment -

          Hello Jan Høydahl

          Thank you for uploading the updated version.
          I am pretty happy with your changes and I would suggest we go ahead with that.
          There are few minor changes that are needed:

          • -        SuggesterResult suggesterResult = suggester.getSuggestions(options);
            +        SuggesterResult suggesterResult = suggester.getSuggestions(options, rb.req);

            The additional param rb.req was needed only for SOLR-7963, so it can be removed here and in all subsequent methods.

          • As we no longer rely on the schema, maybe we should review/rename the test testContextFilterUsesAnalyzerFromSchema()?

          I am not quite sure whether the * Clone of CONTEXTS_FIELD_NAME in AnalyzingInfixSuggester is an excellent idea, but that it very minor.

          So, please let's go ahead with this.
          Thank you very much.

          I will address your comments regarding SOLR-7963 later on.

          Show
          Arcadius Ahouansou added a comment - Hello Jan Høydahl Thank you for uploading the updated version. I am pretty happy with your changes and I would suggest we go ahead with that. There are few minor changes that are needed: - SuggesterResult suggesterResult = suggester.getSuggestions(options); + SuggesterResult suggesterResult = suggester.getSuggestions(options, rb.req); The additional param rb.req was needed only for SOLR-7963 , so it can be removed here and in all subsequent methods. As we no longer rely on the schema, maybe we should review/rename the test testContextFilterUsesAnalyzerFromSchema() ? I am not quite sure whether the * Clone of CONTEXTS_FIELD_NAME in AnalyzingInfixSuggester is an excellent idea, but that it very minor. So, please let's go ahead with this. Thank you very much. I will address your comments regarding SOLR-7963 later on.
          Hide
          Arcadius Ahouansou added a comment -

          Did minor changes suggested earlier:

          • removed rb.req from methods calls
          • renamed testContextFilterUsesAnalyzerFromSchema() -> testContextFilterUsesAnalyzer()
          Show
          Arcadius Ahouansou added a comment - Did minor changes suggested earlier: removed rb.req from methods calls renamed testContextFilterUsesAnalyzerFromSchema() -> testContextFilterUsesAnalyzer()
          Hide
          ASF subversion and git services added a comment -

          Commit 1707907 from janhoy@apache.org in branch 'dev/trunk'
          [ https://svn.apache.org/r1707907 ]

          SOLR-7888: Analyzing suggesters can now filter suggestions by a context field

          Show
          ASF subversion and git services added a comment - Commit 1707907 from janhoy@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1707907 ] SOLR-7888 : Analyzing suggesters can now filter suggestions by a context field
          Hide
          Jan Høydahl added a comment -

          Committed current state to trunk. Let it bake for some time before backport...

          Show
          Jan Høydahl added a comment - Committed current state to trunk. Let it bake for some time before backport...
          Hide
          Arcadius Ahouansou added a comment -

          Thank you very much Jan Høydahl

          Show
          Arcadius Ahouansou added a comment - Thank you very much Jan Høydahl
          Hide
          ASF subversion and git services added a comment -

          Commit 1708103 from janhoy@apache.org in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1708103 ]

          SOLR-7888: Analyzing suggesters can now filter suggestions by a context field (backport)

          Show
          ASF subversion and git services added a comment - Commit 1708103 from janhoy@apache.org in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1708103 ] SOLR-7888 : Analyzing suggesters can now filter suggestions by a context field (backport)
          Hide
          Jan Høydahl added a comment -
          Show
          Jan Høydahl added a comment - Added documentation to refguide: https://cwiki.apache.org/confluence/display/solr/Suggester
          Hide
          Jan Høydahl added a comment -

          Arcadius Ahouansou and Cassandra Targett, I'd appreciate a review of my changes to the refguide page above.

          Show
          Jan Høydahl added a comment - Arcadius Ahouansou and Cassandra Targett , I'd appreciate a review of my changes to the refguide page above.
          Hide
          Arcadius Ahouansou added a comment -

          Hello Jan Høydahl.

          The Wiki looks OK to me.
          I was just wondering whether it is worth adding that the build command will throw an exception if context configuration is added to a suggester that does not support context.

          Thank you very much.

          Show
          Arcadius Ahouansou added a comment - Hello Jan Høydahl . The Wiki looks OK to me. I was just wondering whether it is worth adding that the build command will throw an exception if context configuration is added to a suggester that does not support context. Thank you very much.
          Hide
          Jan Høydahl added a comment -

          Thanks for the review. It's good to know that you'll get an exception instead of silence, but I don't think it is relevant for the refGuide..

          Show
          Jan Høydahl added a comment - Thanks for the review. It's good to know that you'll get an exception instead of silence, but I don't think it is relevant for the refGuide..
          Hide
          Arcadius Ahouansou added a comment -

          Thank you very much Jan Høydahl for your very valuable help, support and contribution on this issue.

          Show
          Arcadius Ahouansou added a comment - Thank you very much Jan Høydahl for your very valuable help, support and contribution on this issue.
          Hide
          Rajesh Kapur added a comment -

          Hi,

          Can we pass multiple fields to be filtered out in ContextField parameter? If yes, please give me example.

          Show
          Rajesh Kapur added a comment - Hi, Can we pass multiple fields to be filtered out in ContextField parameter? If yes, please give me example.

            People

            • Assignee:
              Jan Høydahl
              Reporter:
              Arcadius Ahouansou
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development