Solr
  1. Solr
  2. SOLR-3029

Poor json formatting of spelling collation info

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 4.0-ALPHA
    • Fix Version/s: 4.9, 5.0
    • Component/s: spellchecker
    • Labels:
      None

      Description

      "spellcheck": {
          "suggestions": [
          "dalllas",
          {
      <snip>
              {
                  "word": "canallas",
                  "freq": 1
              }
              ]
          },
          "correctlySpelled",
          false,
          "collation",
          "dallas"
          ]
      }
      

      The correctlySpelled and collation key/values are stored as consecutive elements in an array - quite odd. Is there a reason isn't not a key/value map like most things?

      1. SOLR-3029.patch
        30 kB
        Nalini Kartha
      2. SOLR-3029.patch
        0.8 kB
        Rafał Kuć

        Activity

        Hide
        Yonik Seeley added a comment -

        Looks like someone accidentally used a NamedList (and thus signaled that preserving order was most important) vs SimpleOrderedMap (which signals that access by key is most important). This is sometimes overlooked since they both serialize to the same thing in XML.

        Show
        Yonik Seeley added a comment - Looks like someone accidentally used a NamedList (and thus signaled that preserving order was most important) vs SimpleOrderedMap (which signals that access by key is most important). This is sometimes overlooked since they both serialize to the same thing in XML.
        Hide
        Antony Stubbs added a comment -

        I imagine this will also be in 3.5?

        Show
        Antony Stubbs added a comment - I imagine this will also be in 3.5?
        Hide
        Rafał Kuć added a comment -

        Simple patch correcting the isssue.

        Show
        Rafał Kuć added a comment - Simple patch correcting the isssue.
        Hide
        Yonik Seeley added a comment -

        Any objections to changing this for both 3.6 and 4.0?
        It's technically a back compat break, but it was messed up to begin with (and the change should not affect clients using javabin or xml).

        Show
        Yonik Seeley added a comment - Any objections to changing this for both 3.6 and 4.0? It's technically a back compat break, but it was messed up to begin with (and the change should not affect clients using javabin or xml).
        Hide
        Yonik Seeley added a comment -

        OK, this may be too much for 3.6.
        I tried an example and it looks like there are a couple of places with this problem.

        I edited solr.xml so the name field looked like

          <field name="name">Solr, the Enterprise Search Server solra solrb abcdefgx abcdefgy</field>
        

        Then I tried the following spellcheck command:

        http://localhost:8983/solr/spell?q=abcdefgq%20solrz&spellcheck=true&spellcheck.collate=true&spellcheck.build=true&spellcheck.collateExtendedResults=true&spellcheck.count=3&spellcheck.maxCollations=3&wt=json&indent=true

          "spellcheck":{
            "suggestions":[
              "abcdefgq",{
                "numFound":2,
                "startOffset":0,
                "endOffset":8,
                "suggestion":["abcdefgx",
                  "abcdefgy"]},
              "solrz",{
                "numFound":3,
                "startOffset":9,
                "endOffset":14,
                "suggestion":["solra",
                  "solrb",
                  "solr"]},
              "collation",[
                "collationQuery","abcdefgx solra",
                "hits",0,
                "misspellingsAndCorrections",[
                  "abcdefgq","abcdefgx",
                  "solrz","solra"]]]}}
        

        Apologies if I'm misinterpreting some things - I don't have much experience with the spellchecker stuff (other than trying to clean up the tests in the past).
        Observations:

        • Unless order is really important, "suggestions" should be a map
        • same for "collation" and "misspellingsAndCorrections"
        • why is "collation" inside "suggestions" along with other words? should this be one level higher?
        • why isn't this giving me multiple collations... am I misunderstanding the spellcheck.maxCollations parameter
        • why aren't multiple suggestions returned in misspellingsAndCorrections? (and what's the purpose of this seemingly redundant info anyway?)
        • I briefly tried distributed search by adding &shards=localhost:8983/solr&shards.qt=/spell and I get
            "error":{
              "msg":"isShard is only acceptable with search handlers",
              "code":400}}
          
        Show
        Yonik Seeley added a comment - OK, this may be too much for 3.6. I tried an example and it looks like there are a couple of places with this problem. I edited solr.xml so the name field looked like <field name= "name" >Solr, the Enterprise Search Server solra solrb abcdefgx abcdefgy</field> Then I tried the following spellcheck command: http://localhost:8983/solr/spell?q=abcdefgq%20solrz&spellcheck=true&spellcheck.collate=true&spellcheck.build=true&spellcheck.collateExtendedResults=true&spellcheck.count=3&spellcheck.maxCollations=3&wt=json&indent=true "spellcheck" :{ "suggestions" :[ "abcdefgq" ,{ "numFound" :2, "startOffset" :0, "endOffset" :8, "suggestion" :[ "abcdefgx" , "abcdefgy" ]}, "solrz" ,{ "numFound" :3, "startOffset" :9, "endOffset" :14, "suggestion" :[ "solra" , "solrb" , "solr" ]}, "collation" ,[ "collationQuery" , "abcdefgx solra" , "hits" ,0, "misspellingsAndCorrections" ,[ "abcdefgq" , "abcdefgx" , "solrz" , "solra" ]]]}} Apologies if I'm misinterpreting some things - I don't have much experience with the spellchecker stuff (other than trying to clean up the tests in the past). Observations: Unless order is really important, "suggestions" should be a map same for "collation" and "misspellingsAndCorrections" why is "collation" inside "suggestions" along with other words? should this be one level higher? why isn't this giving me multiple collations... am I misunderstanding the spellcheck.maxCollations parameter why aren't multiple suggestions returned in misspellingsAndCorrections? (and what's the purpose of this seemingly redundant info anyway?) I briefly tried distributed search by adding &shards=localhost:8983/solr&shards.qt=/spell and I get "error" :{ "msg" : "isShard is only acceptable with search handlers" , "code" :400}}
        Hide
        James Dyer added a comment -

        Yonik,

        I can answer some of your questions. I do agree the spellcheck response format leaves something to be desired and maybe 4.0 is a good time to break backwards and improve it.

        Unless order is really important, "suggestions" should be a map

        I don't see why order would matter here, although some users might like to see the corrections listed in the order they appeared in the query.

        same for "collation"

        The collations are ranked, so order is important.

        and "misspellingsAndCorrections"

        The order shouldn't matter unless users are picky about the corrections being presented in the order they occur in the query.

        why is "collation" inside "suggestions" along with other words? should this be one level higher?

        This always confused me too. I agree it should be one level higher.

        why isn't this giving me multiple collations

        This is a bug. See SOLR-2853.

        why aren't multiple suggestions returned in misspellingsAndCorrections? (and what's the purpose ...?)

        This is nested with the Collation and gives details, for that particular collation, which misspelled word got which replacement. This makes it easy for clients to generate messages like "no results found for abcdefgq ... Showing abcdefgx instead!" You can suppress this information by not specifying "spellcheck.collateExtendedResults=true". For users (like me) who are interested in the collations only and don't care about individual-word corrections, it would be nice if we could suppress the first section of the response entirely.

        I briefly tried distributed search...

        DistributedSpellCheckComponentTest is supposed to detect problems like this but maybe something is going on and there is a bug this test isn't catching?

        For what its worth you had voiced some misgivings about the JSON format when the multiple-collations feature was added. At that time I supplied a quick patch to address your concerns. I'm not sure if that patch fixes the problem described here. See SOLR-2010 and your comment from Oct 16, 2010 and the (now outdated, never committed) patch I supplied on Oct 20.

        The patch on this issue causes multiple test failures although I didn't look into them.

        Show
        James Dyer added a comment - Yonik, I can answer some of your questions. I do agree the spellcheck response format leaves something to be desired and maybe 4.0 is a good time to break backwards and improve it. Unless order is really important, "suggestions" should be a map I don't see why order would matter here, although some users might like to see the corrections listed in the order they appeared in the query. same for "collation" The collations are ranked, so order is important. and "misspellingsAndCorrections" The order shouldn't matter unless users are picky about the corrections being presented in the order they occur in the query. why is "collation" inside "suggestions" along with other words? should this be one level higher? This always confused me too. I agree it should be one level higher. why isn't this giving me multiple collations This is a bug. See SOLR-2853 . why aren't multiple suggestions returned in misspellingsAndCorrections? (and what's the purpose ...?) This is nested with the Collation and gives details, for that particular collation, which misspelled word got which replacement. This makes it easy for clients to generate messages like "no results found for abcdefgq ... Showing abcdefgx instead!" You can suppress this information by not specifying "spellcheck.collateExtendedResults=true". For users (like me) who are interested in the collations only and don't care about individual-word corrections, it would be nice if we could suppress the first section of the response entirely. I briefly tried distributed search... DistributedSpellCheckComponentTest is supposed to detect problems like this but maybe something is going on and there is a bug this test isn't catching? For what its worth you had voiced some misgivings about the JSON format when the multiple-collations feature was added. At that time I supplied a quick patch to address your concerns. I'm not sure if that patch fixes the problem described here. See SOLR-2010 and your comment from Oct 16, 2010 and the (now outdated, never committed) patch I supplied on Oct 20. The patch on this issue causes multiple test failures although I didn't look into them.
        Hide
        Yonik Seeley added a comment -

        Thanks for the explanations! (and yeah, I had lost track of the older issues...)

        The collations are ranked, so order is important.

        I guess I meant for a single collation:

        "collation",[
                "collationQuery","abcdefgx solra",
                "hits",0,
                "misspellingsAndCorrections",[...
        

        It seems like that should be a map?

        {"collationQuery" : "abcdefgx solra",
         "hits" : 0
         "misspellingsAndCorrections" : {...
        }
        

        And if there are multiple collations, there should be an array of those maps?

        I do agree the spellcheck response format leaves something to be desired and maybe 4.0 is a good time to break backwards and improve it.

        +1, go for it!

        Show
        Yonik Seeley added a comment - Thanks for the explanations! (and yeah, I had lost track of the older issues...) The collations are ranked, so order is important. I guess I meant for a single collation: "collation" ,[ "collationQuery" , "abcdefgx solra" , "hits" ,0, "misspellingsAndCorrections" ,[... It seems like that should be a map? { "collationQuery" : "abcdefgx solra" , "hits" : 0 "misspellingsAndCorrections" : {... } And if there are multiple collations, there should be an array of those maps? I do agree the spellcheck response format leaves something to be desired and maybe 4.0 is a good time to break backwards and improve it. +1, go for it!
        Hide
        Dominique Béjean added a comment - - edited

        What about this issue ? It looks like nothing append during the last year.
        Here is a sample of an invalid json response for the collation with Solr 4.1

        "spellcheck":{
            "suggestions":{
              ...
              "correctlySpelled":false,
              "collation":"lang  louis",
              "collation":"long  loic",
              "collation":"land  loic"}},
              ...
        

        Collation should be an array

        ...
            "collation": [
                "lang  louis",
                "long  loic",
                "land  loic"
            ]
        ...
        

        +1 !

        Show
        Dominique Béjean added a comment - - edited What about this issue ? It looks like nothing append during the last year. Here is a sample of an invalid json response for the collation with Solr 4.1 "spellcheck" :{     "suggestions" :{      ...       "correctlySpelled" : false ,       "collation" : "lang  louis" ,       "collation" : " long  loic" ,       "collation" : "land  loic" }}, ... Collation should be an array ... "collation" : [ "lang  louis" , " long  loic" , "land  loic" ] ... +1 !
        Hide
        Jérôme Termes added a comment -

        Same for me
        +1, go for it!

        Show
        Jérôme Termes added a comment - Same for me +1, go for it!
        Hide
        Steve Rowe added a comment -

        Bulk move 4.4 issues to 4.5 and 5.0

        Show
        Steve Rowe added a comment - Bulk move 4.4 issues to 4.5 and 5.0
        Hide
        Uwe Schindler added a comment -

        Move issue to Solr 4.9.

        Show
        Uwe Schindler added a comment - Move issue to Solr 4.9.
        Hide
        Nalini Kartha added a comment - - edited

        New patch taking into account comments on the JIRA -

        • Moved correctlySpelled boolean and collation list out of suggestions. Left suggestions and collations as NamedLists based on previous comments (maybe suggestions can also be made a SimpleOrderedMap though?).
        • Fixed all spellcheck tests to be compatible with the new response format.
        • Changed QueryResponse._spellInfo to a NamedList<Object> instead NamedList<NamedList<Object>> since correctlySpelled is at top level now.
        • Fixed DistributedSpellCheckComponentTest which started failing after this change - it was asserting that 'suggestions' was not empty for the collate test but no suggestions were actually being generated (was passing because correctlySpelled flag was in the 'suggestions' NamedList). Added a new term to the index which is within 2 edits of the query term ('quicker' and 'quick' respectively).

        cc James Dyer

        Show
        Nalini Kartha added a comment - - edited New patch taking into account comments on the JIRA - Moved correctlySpelled boolean and collation list out of suggestions. Left suggestions and collations as NamedLists based on previous comments (maybe suggestions can also be made a SimpleOrderedMap though?). Fixed all spellcheck tests to be compatible with the new response format. Changed QueryResponse._spellInfo to a NamedList<Object> instead NamedList<NamedList<Object>> since correctlySpelled is at top level now. Fixed DistributedSpellCheckComponentTest which started failing after this change - it was asserting that 'suggestions' was not empty for the collate test but no suggestions were actually being generated (was passing because correctlySpelled flag was in the 'suggestions' NamedList). Added a new term to the index which is within 2 edits of the query term ('quicker' and 'quick' respectively). cc James Dyer
        Hide
        James Dyer added a comment -

        I would think a way forward is to break backwards in the next release with an option you can set in the config that reverts it to the old format. We can remove the old format entirely in 5.0. Agree?

        Show
        James Dyer added a comment - I would think a way forward is to break backwards in the next release with an option you can set in the config that reverts it to the old format. We can remove the old format entirely in 5.0. Agree?
        Hide
        Nalini Kartha added a comment -

        Sorry, I should have mentioned that the patch provided is for trunk (figured that was safer since it is a backwards incompatible change). I can also provide a patch for branch_4x that adds a config option to revert to the old format.

        Show
        Nalini Kartha added a comment - Sorry, I should have mentioned that the patch provided is for trunk (figured that was safer since it is a backwards incompatible change). I can also provide a patch for branch_4x that adds a config option to revert to the old format.
        Hide
        James Dyer added a comment -

        its pretty easy to merge trunk changes back to 4.x. If we go this route, though, its not a straight merge as we'd need to provide a switch to let users have it the old way too. We can do 5.0-only also, nothing wrong with that imo, and easier.

        Show
        James Dyer added a comment - its pretty easy to merge trunk changes back to 4.x. If we go this route, though, its not a straight merge as we'd need to provide a switch to let users have it the old way too. We can do 5.0-only also, nothing wrong with that imo, and easier.
        Hide
        Nalini Kartha added a comment -

        Fixing in 5.0 sounds good to me too. I guess the next step is for one of the committers to review and then checkin the changes? Let me know if there's anything in the patch that needs fixing/improving.

        Show
        Nalini Kartha added a comment - Fixing in 5.0 sounds good to me too. I guess the next step is for one of the committers to review and then checkin the changes? Let me know if there's anything in the patch that needs fixing/improving.

          People

          • Assignee:
            Unassigned
            Reporter:
            Antony Stubbs
          • Votes:
            3 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

            • Created:
              Updated:

              Development