Solr
  1. Solr
  2. SOLR-5528

Change New Suggester Response and minor cleanups

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.7, 5.0
    • Labels:
      None

      Description

      It would be nice to have a simplified response format for the new Suggester Component.
      The proposed format is as follows:
      XML:

      <?xml version="1.0" encoding="UTF-8"?>
      <response>
         <lst name="responseHeader">
            <int name="status">0</int>
            <int name="QTime">32</int>
         </lst>
         <str name="command">build</str>
         <lst name="suggest">
            <lst name="ele">
               <int name="numFound">1</int>
               <arr name="suggestions">
                  <lst>
                     <str name="term">electronics and computer1</str>
                     <long name="weight">2199</long>
                     <str name="payload" />
                  </lst>
               </arr>
            </lst>
         </lst>
      </response>
      

      JSON:

      {
          "responseHeader": {
              "status": 0,
              "QTime": 30
          },
          "command": "build",
          "suggest": {
              "ele": {
                  "numFound": 1,
                  "suggestions": [
                      {
                          "term": "electronics and computer1",
                          "weight": 2199,
                          "payload": ""
                      }
                  ]
              }
          }
      }
      
      1. SOLR-5528.patch
        34 kB
        Areek Zillur

        Issue Links

          Activity

          Hide
          Areek Zillur added a comment -

          Thanks for committing this!

          Show
          Areek Zillur added a comment - Thanks for committing this!
          Hide
          Robert Muir added a comment -

          Thanks Areek!

          Show
          Robert Muir added a comment - Thanks Areek!
          Hide
          ASF subversion and git services added a comment -

          Commit 1551759 from Robert Muir in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1551759 ]

          SOLR-5528: improve response format of the new SuggestComponent

          Show
          ASF subversion and git services added a comment - Commit 1551759 from Robert Muir in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1551759 ] SOLR-5528 : improve response format of the new SuggestComponent
          Hide
          ASF subversion and git services added a comment -

          Commit 1551753 from Robert Muir in branch 'dev/trunk'
          [ https://svn.apache.org/r1551753 ]

          SOLR-5528: improve response format of the new SuggestComponent

          Show
          ASF subversion and git services added a comment - Commit 1551753 from Robert Muir in branch 'dev/trunk' [ https://svn.apache.org/r1551753 ] SOLR-5528 : improve response format of the new SuggestComponent
          Hide
          Robert Muir added a comment -

          Patch looks good Areek. I'll commit later today if there are no objections.

          Show
          Robert Muir added a comment - Patch looks good Areek. I'll commit later today if there are no objections.
          Hide
          Areek Zillur added a comment -

          NOTE: I mistakenly referred to SOLR-5528 (current jira) instead of the intended SOLR-5529.

          Show
          Areek Zillur added a comment - NOTE: I mistakenly referred to SOLR-5528 (current jira) instead of the intended SOLR-5529 .
          Hide
          Areek Zillur added a comment - - edited

          Hey Erick, just to be clear the patch for the proposal is already ready (SOLR-5529). The only thing is the other patch depends on this patch to be checked in (due to its reliance on the new suggester response format). I will stress again that these changes are only for the NEW suggester component. you can help out by reviewing it

          Show
          Areek Zillur added a comment - - edited Hey Erick, just to be clear the patch for the proposal is already ready ( SOLR-5529 ). The only thing is the other patch depends on this patch to be checked in (due to its reliance on the new suggester response format). I will stress again that these changes are only for the NEW suggester component. you can help out by reviewing it
          Hide
          Erick Erickson added a comment -

          Areek:

          That works for me!

          My suggestions (pun intended) were leveraging off the fact that it "just works" now, it would be the minimum effort to add in the name. That said, I wasn't really happy with the fact that the "name" elements had to be the same for the different components or you get a mysterious NPE (<str name="name">suggest</str>). That indicated that this wasn't thought-out, just happenstance and thus quite possibly fragile.

          But I can't put the effort into finding out why or fixing in the near future, so I was punting. Your idea is much cleaner and I'll be happy to help work on it...when I can which won't be for some time.

          Go for it!

          Erick

          Show
          Erick Erickson added a comment - Areek: That works for me! My suggestions (pun intended) were leveraging off the fact that it "just works" now, it would be the minimum effort to add in the name. That said, I wasn't really happy with the fact that the "name" elements had to be the same for the different components or you get a mysterious NPE (<str name="name">suggest</str>). That indicated that this wasn't thought-out, just happenstance and thus quite possibly fragile. But I can't put the effort into finding out why or fixing in the near future, so I was punting. Your idea is much cleaner and I'll be happy to help work on it...when I can which won't be for some time. Go for it! Erick
          Hide
          Areek Zillur added a comment - - edited

          I created and uploaded the patch (SOLR-5529) which will allow users to specify multiple suggesters within a suggesterComponent. It should work in standalone and distributed mode. It was more work than I thought but hopefully it will take care of the use-cases, while letting a single component manage the state.

          Show
          Areek Zillur added a comment - - edited I created and uploaded the patch ( SOLR-5529 ) which will allow users to specify multiple suggesters within a suggesterComponent. It should work in standalone and distributed mode. It was more work than I thought but hopefully it will take care of the use-cases, while letting a single component manage the state.
          Hide
          Areek Zillur added a comment - - edited

          Sorry for the joining so late in the discussion!

          My thoughts:

          • I think using several suggest component to solve any use case is not ideal! The same use cases (as mentioned by Erick) can be solved if there was a way to get suggestions from multiple dictionaries at one go.

          So in order to fulfill these use-cases, I believe we should allow users to 'query' multiple suggesters in one <b>single</b> component.

          Example config:

          <searchComponent class="solr.SuggestComponent" name="suggest">
              <lst name="suggester">
                <str name="name">name_suggester</str>
                <str name="lookupImpl">FuzzyLookupFactory</str>
                <str name="dictionaryImpl">DocumentDictionaryFactory</str>
                <str name="field">cat</str>
                <str name="weightField">price</str>
                <str name="storeDir">suggest_fuzzy_doc_dict_payload</str>
                <str name="suggestAnalyzerFieldType">text</str>
                <str name="buildOnCommit">true</str>
              </lst>
          
              <lst name="suggester">
                <str name="name">feature_suggester</str>
                <str name="dictionaryImpl">DocumentExpressionDictionaryFactory</str>
                <str name="lookupImpl">FuzzyLookupFactory</str>
                <str name="field">cat</str>
                <str name="weightExpression">((price * 2) + weight)</str>
                <str name="sortField">weight</str>
                <str name="sortField">price</str>
                <str name="storeDir">suggest_fuzzy_doc_expr_dict</str>
                <str name="suggestAnalyzerFieldType">text</str>
                <str name="buildOnCommit">true</str>
              </lst>
            </searchComponent>
          

          And then the user can query as follows:

           /suggest?suggest.q=blah&suggest.dictionary=name_suggester&suggest.dictionary=feature_suggester 

          In order for the user to distinguish the suggestions, we can change the response format to:

          {
              "responseHeader": {
                  "status": 0,
                  "QTime": 30
              },
              "suggest": {
                  name_suggester: {
                      "ele": {
                      "numFound": 1,
                      "suggestions": [
                          {
                              "term": "electronics and computer1",
                              "weight": 2199,
                              "payload": ""
                          }
                      ]
                  }
              },
                  feature_suggester: {
                      "ele": {
                      "numFound": 1,
                      "suggestions": [
                          {
                              "term": "electronics and computer1",
                              "weight": 2199,
                              "payload": ""
                          }
                      ]
                  }
              }
              }
          }
          

          By making sure this is done in the component level (single not multiple), we can also ensure that it will work in all cases that it should (distributed).

          There is some changes necessary to support the proposed way, besides just the response format. Hence, I believe it should be done in a separate jira (will create and upload the patch soon).
          I also dont intend to add this to the old suggester (based on spellcheck component) as the component is doing too many things already .

          Thoughts?

          Show
          Areek Zillur added a comment - - edited Sorry for the joining so late in the discussion! My thoughts: I think using several suggest component to solve any use case is not ideal! The same use cases (as mentioned by Erick) can be solved if there was a way to get suggestions from multiple dictionaries at one go. So in order to fulfill these use-cases, I believe we should allow users to 'query' multiple suggesters in one <b>single</b> component. Example config: <searchComponent class= "solr.SuggestComponent" name= "suggest" > <lst name= "suggester" > <str name= "name" >name_suggester</str> <str name= "lookupImpl" >FuzzyLookupFactory</str> <str name= "dictionaryImpl" >DocumentDictionaryFactory</str> <str name= "field" >cat</str> <str name= "weightField" >price</str> <str name= "storeDir" >suggest_fuzzy_doc_dict_payload</str> <str name= "suggestAnalyzerFieldType" >text</str> <str name= "buildOnCommit" > true </str> </lst> <lst name= "suggester" > <str name= "name" >feature_suggester</str> <str name= "dictionaryImpl" >DocumentExpressionDictionaryFactory</str> <str name= "lookupImpl" >FuzzyLookupFactory</str> <str name= "field" >cat</str> <str name= "weightExpression" >((price * 2) + weight)</str> <str name= "sortField" >weight</str> <str name= "sortField" >price</str> <str name= "storeDir" >suggest_fuzzy_doc_expr_dict</str> <str name= "suggestAnalyzerFieldType" >text</str> <str name= "buildOnCommit" > true </str> </lst> </searchComponent> And then the user can query as follows: /suggest?suggest.q=blah&suggest.dictionary=name_suggester&suggest.dictionary=feature_suggester In order for the user to distinguish the suggestions, we can change the response format to: { "responseHeader" : { "status" : 0, "QTime" : 30 }, "suggest" : { name_suggester: { "ele" : { "numFound" : 1, "suggestions" : [ { "term" : "electronics and computer1" , "weight" : 2199, "payload" : "" } ] } }, feature_suggester: { "ele" : { "numFound" : 1, "suggestions" : [ { "term" : "electronics and computer1" , "weight" : 2199, "payload" : "" } ] } } } } By making sure this is done in the component level (single not multiple), we can also ensure that it will work in all cases that it should (distributed). There is some changes necessary to support the proposed way, besides just the response format. Hence, I believe it should be done in a separate jira (will create and upload the patch soon). I also dont intend to add this to the old suggester (based on spellcheck component) as the component is doing too many things already . Thoughts?
          Hide
          Erick Erickson added a comment -

          bq: Remember: this isnt spellcheck. so "bloating the index" doesnt really exist. Its FSTs.

          Well, you have to copy all the individual fields into an extra field upon which you build the FST if you do the "copy all fields you want to contribute to suggestions into a common field and suggest on that" work-around for getting suggestions from multiple fields.

          About LUCENE-5350. Hmmm, interesting. I'm not entirely sure what the possibilities are there. On a very quick glance it looks like complimentary but not the same cases. But I could very well be mistaken.

          None of this is new capability, it's all do-able by relying on the position of the returned sections of the suggest parts of the response to be in the same order as the components of the suggest request handler were defined in the present code. It's just about adding one more bit to the suggester response to make it easier to identify which component generated that section, that's all.

          I don't much like relying on position and the loose coupling between the order of elements in the component chain and the order of the suggest sections in the response. I guess that if we're willing to guarantee that ordering in this and future releases, then the current capabilities will work robustly.

          Show
          Erick Erickson added a comment - bq: Remember: this isnt spellcheck. so "bloating the index" doesnt really exist. Its FSTs. Well, you have to copy all the individual fields into an extra field upon which you build the FST if you do the "copy all fields you want to contribute to suggestions into a common field and suggest on that " work-around for getting suggestions from multiple fields. About LUCENE-5350 . Hmmm, interesting. I'm not entirely sure what the possibilities are there. On a very quick glance it looks like complimentary but not the same cases. But I could very well be mistaken. None of this is new capability, it's all do-able by relying on the position of the returned sections of the suggest parts of the response to be in the same order as the components of the suggest request handler were defined in the present code. It's just about adding one more bit to the suggester response to make it easier to identify which component generated that section, that's all. I don't much like relying on position and the loose coupling between the order of elements in the component chain and the order of the suggest sections in the response. I guess that if we're willing to guarantee that ordering in this and future releases, then the current capabilities will work robustly.
          Hide
          Robert Muir added a comment -

          I'm not sure: it seems like such use cases can be dealt with properly by stuff like LUCENE-5350 ?

          Remember: this isnt spellcheck. so "bloating the index" doesnt really exist. Its FSTs.

          Show
          Robert Muir added a comment - I'm not sure: it seems like such use cases can be dealt with properly by stuff like LUCENE-5350 ? Remember: this isnt spellcheck. so "bloating the index" doesnt really exist. Its FSTs.
          Hide
          Erick Erickson added a comment -

          bq: I don't think there is a real use case for multiple suggesters

          Here are two:

          1> You want to get suggestions from more than one field without bloating your index by copying all the fields into a single "suggest" field. Without making multiple requests. I've seen this go by on the user's list multiple times.

          2> You want to display suggestions from different fields differently, say giving more weight to ones from "title".

          As for complexity, I don't see how adding one field to the response that may be ignored is adding much in the way of complexity, but then I'm not doing the work so....

          All the rest is supported OOB, so the use-cases are realizable even now without adding the name of the component. Adding the name seems useful though.

          Show
          Erick Erickson added a comment - bq: I don't think there is a real use case for multiple suggesters Here are two: 1> You want to get suggestions from more than one field without bloating your index by copying all the fields into a single "suggest" field. Without making multiple requests. I've seen this go by on the user's list multiple times. 2> You want to display suggestions from different fields differently, say giving more weight to ones from "title". As for complexity, I don't see how adding one field to the response that may be ignored is adding much in the way of complexity, but then I'm not doing the work so.... All the rest is supported OOB, so the use-cases are realizable even now without adding the name of the component. Adding the name seems useful though.
          Hide
          Robert Muir added a comment -

          I dont think there is a real use case for multiple suggesters Erick. That sounds like a bad way to solve some other problem.

          I dont think we should complicate the format with this stuff.

          Show
          Robert Muir added a comment - I dont think there is a real use case for multiple suggesters Erick. That sounds like a bad way to solve some other problem. I dont think we should complicate the format with this stuff.
          Hide
          Erick Erickson added a comment -

          Robert:

          I think it's the same consideration though. The new suggester subclasses SearchComponent. You can define multiple SearchComponents in a requestHandler "components" section and they'd be added to the response as separate sections. So adding the name to the section added by each component would help identify where it came from and you wouldn't have to rely on order and "meta knowledge" of the order defined in solrconfig.xml. I suspect processing distributed suggestions would be more robust too, but that's just a SWAG.

          I'm not insisting on this mind you, but I was just looking at this and it seems like there's no downside and potential upside so I wanted to suggest it. The fact that I put my code in SpellCheckComponent was just illustrative. Although it could go there too if it's not a back-compat issue I suppose.

          Show
          Erick Erickson added a comment - Robert: I think it's the same consideration though. The new suggester subclasses SearchComponent. You can define multiple SearchComponents in a requestHandler "components" section and they'd be added to the response as separate sections. So adding the name to the section added by each component would help identify where it came from and you wouldn't have to rely on order and "meta knowledge" of the order defined in solrconfig.xml. I suspect processing distributed suggestions would be more robust too, but that's just a SWAG. I'm not insisting on this mind you, but I was just looking at this and it seems like there's no downside and potential upside so I wanted to suggest it. The fact that I put my code in SpellCheckComponent was just illustrative. Although it could go there too if it's not a back-compat issue I suppose.
          Hide
          Robert Muir added a comment -

          Erick: Areek isn't referring to spellcheckcomponent here.

          This is about the new suggestcomponent.

          Show
          Robert Muir added a comment - Erick: Areek isn't referring to spellcheckcomponent here. This is about the new suggestcomponent.
          Hide
          Erick Erickson added a comment -

          Areek:

          As luck would have it, a client noticed that you can string two or more suggesters together by listing multiple components in the request handler <components> section.. I posted a detailed account on the dev list. This allows getting suggestions from multiple fields (or whatever components you define) in separate sections of the response. I'm not that sure whether this is intended behavior or serendipitous.

          The crux of the matter is that I can see it being useful to return the name of the component in the suggest section, perhaps a sibling to "numFound" it would help disambiguate the response and might help with distributed processing. But I'm guessing on this last.

          If we're changing the response format anyway, do you see any harm in putting this added bit in? I hacked a very quick test in by changing toNamedList in SpellCheckComponent, adding getName() to the call, like this:

          NamedList suggestions = toNamedList(getName(), [all the rest of the parameters just as now]).

          then adding the string from getName() in to the named list. getName(), of course, is just the bits from
          {{
          <arr name="components">
          <str>name</str>
          <str>features</str>
          </arr>
          }}

          i.e. "name" and "features" in this example.

          FWIW,
          Erick

          Show
          Erick Erickson added a comment - Areek: As luck would have it, a client noticed that you can string two or more suggesters together by listing multiple components in the request handler <components> section.. I posted a detailed account on the dev list. This allows getting suggestions from multiple fields (or whatever components you define) in separate sections of the response. I'm not that sure whether this is intended behavior or serendipitous. The crux of the matter is that I can see it being useful to return the name of the component in the suggest section, perhaps a sibling to "numFound" it would help disambiguate the response and might help with distributed processing. But I'm guessing on this last. If we're changing the response format anyway, do you see any harm in putting this added bit in? I hacked a very quick test in by changing toNamedList in SpellCheckComponent, adding getName() to the call, like this: NamedList suggestions = toNamedList(getName(), [all the rest of the parameters just as now] ). then adding the string from getName() in to the named list. getName(), of course, is just the bits from {{ <arr name="components"> <str>name</str> <str>features</str> </arr> }} i.e. "name" and "features" in this example. FWIW, Erick
          Hide
          Areek Zillur added a comment -

          Initial patch:

          • response format change
          • add more strict namedList type
          • fix log messages/ error messages
          • renamed DistributedSuggesterComponentTest to DistributedSuggestComponenetTest
          • expose sizeInBytes method for extensibility
          Show
          Areek Zillur added a comment - Initial patch: response format change add more strict namedList type fix log messages/ error messages renamed DistributedSuggesterComponentTest to DistributedSuggestComponenetTest expose sizeInBytes method for extensibility

            People

            • Assignee:
              Unassigned
              Reporter:
              Areek Zillur
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development