Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-3029

Poor json formatting of spelling collation info

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 4.0-ALPHA
    • Fix Version/s: 5.0, 6.0
    • Component/s: spellchecker
    • Labels:
      None

      Description

      "spellcheck": {
          "suggestions": [
          "dalllas",
          {
      <snip>
              {
                  "word": "canallas",
                  "freq": 1
              }
              ]
          },
          "correctlySpelled",
          false,
          "collation",
          "dallas"
          ]
      }
      

      The correctlySpelled and collation key/values are stored as consecutive elements in an array - quite odd. Is there a reason isn't not a key/value map like most things?

      1. SOLR-3029.patch
        28 kB
        James Dyer
      2. SOLR-3029.patch
        30 kB
        Nalini Kartha
      3. SOLR-3029.patch
        0.8 kB
        Rafał Kuć

        Issue Links

          Activity

          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          Looks like someone accidentally used a NamedList (and thus signaled that preserving order was most important) vs SimpleOrderedMap (which signals that access by key is most important). This is sometimes overlooked since they both serialize to the same thing in XML.

          Show
          yseeley@gmail.com Yonik Seeley added a comment - Looks like someone accidentally used a NamedList (and thus signaled that preserving order was most important) vs SimpleOrderedMap (which signals that access by key is most important). This is sometimes overlooked since they both serialize to the same thing in XML.
          Hide
          astubbs Antony Stubbs added a comment -

          I imagine this will also be in 3.5?

          Show
          astubbs Antony Stubbs added a comment - I imagine this will also be in 3.5?
          Hide
          gro Rafał Kuć added a comment -

          Simple patch correcting the isssue.

          Show
          gro Rafał Kuć added a comment - Simple patch correcting the isssue.
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          Any objections to changing this for both 3.6 and 4.0?
          It's technically a back compat break, but it was messed up to begin with (and the change should not affect clients using javabin or xml).

          Show
          yseeley@gmail.com Yonik Seeley added a comment - Any objections to changing this for both 3.6 and 4.0? It's technically a back compat break, but it was messed up to begin with (and the change should not affect clients using javabin or xml).
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          OK, this may be too much for 3.6.
          I tried an example and it looks like there are a couple of places with this problem.

          I edited solr.xml so the name field looked like

            <field name="name">Solr, the Enterprise Search Server solra solrb abcdefgx abcdefgy</field>
          

          Then I tried the following spellcheck command:

          http://localhost:8983/solr/spell?q=abcdefgq%20solrz&spellcheck=true&spellcheck.collate=true&spellcheck.build=true&spellcheck.collateExtendedResults=true&spellcheck.count=3&spellcheck.maxCollations=3&wt=json&indent=true

            "spellcheck":{
              "suggestions":[
                "abcdefgq",{
                  "numFound":2,
                  "startOffset":0,
                  "endOffset":8,
                  "suggestion":["abcdefgx",
                    "abcdefgy"]},
                "solrz",{
                  "numFound":3,
                  "startOffset":9,
                  "endOffset":14,
                  "suggestion":["solra",
                    "solrb",
                    "solr"]},
                "collation",[
                  "collationQuery","abcdefgx solra",
                  "hits",0,
                  "misspellingsAndCorrections",[
                    "abcdefgq","abcdefgx",
                    "solrz","solra"]]]}}
          

          Apologies if I'm misinterpreting some things - I don't have much experience with the spellchecker stuff (other than trying to clean up the tests in the past).
          Observations:

          • Unless order is really important, "suggestions" should be a map
          • same for "collation" and "misspellingsAndCorrections"
          • why is "collation" inside "suggestions" along with other words? should this be one level higher?
          • why isn't this giving me multiple collations... am I misunderstanding the spellcheck.maxCollations parameter
          • why aren't multiple suggestions returned in misspellingsAndCorrections? (and what's the purpose of this seemingly redundant info anyway?)
          • I briefly tried distributed search by adding &shards=localhost:8983/solr&shards.qt=/spell and I get
              "error":{
                "msg":"isShard is only acceptable with search handlers",
                "code":400}}
            
          Show
          yseeley@gmail.com Yonik Seeley added a comment - OK, this may be too much for 3.6. I tried an example and it looks like there are a couple of places with this problem. I edited solr.xml so the name field looked like <field name= "name" >Solr, the Enterprise Search Server solra solrb abcdefgx abcdefgy</field> Then I tried the following spellcheck command: http://localhost:8983/solr/spell?q=abcdefgq%20solrz&spellcheck=true&spellcheck.collate=true&spellcheck.build=true&spellcheck.collateExtendedResults=true&spellcheck.count=3&spellcheck.maxCollations=3&wt=json&indent=true "spellcheck" :{ "suggestions" :[ "abcdefgq" ,{ "numFound" :2, "startOffset" :0, "endOffset" :8, "suggestion" :[ "abcdefgx" , "abcdefgy" ]}, "solrz" ,{ "numFound" :3, "startOffset" :9, "endOffset" :14, "suggestion" :[ "solra" , "solrb" , "solr" ]}, "collation" ,[ "collationQuery" , "abcdefgx solra" , "hits" ,0, "misspellingsAndCorrections" ,[ "abcdefgq" , "abcdefgx" , "solrz" , "solra" ]]]}} Apologies if I'm misinterpreting some things - I don't have much experience with the spellchecker stuff (other than trying to clean up the tests in the past). Observations: Unless order is really important, "suggestions" should be a map same for "collation" and "misspellingsAndCorrections" why is "collation" inside "suggestions" along with other words? should this be one level higher? why isn't this giving me multiple collations... am I misunderstanding the spellcheck.maxCollations parameter why aren't multiple suggestions returned in misspellingsAndCorrections? (and what's the purpose of this seemingly redundant info anyway?) I briefly tried distributed search by adding &shards=localhost:8983/solr&shards.qt=/spell and I get "error" :{ "msg" : "isShard is only acceptable with search handlers" , "code" :400}}
          Hide
          jdyer James Dyer added a comment -

          Yonik,

          I can answer some of your questions. I do agree the spellcheck response format leaves something to be desired and maybe 4.0 is a good time to break backwards and improve it.

          Unless order is really important, "suggestions" should be a map

          I don't see why order would matter here, although some users might like to see the corrections listed in the order they appeared in the query.

          same for "collation"

          The collations are ranked, so order is important.

          and "misspellingsAndCorrections"

          The order shouldn't matter unless users are picky about the corrections being presented in the order they occur in the query.

          why is "collation" inside "suggestions" along with other words? should this be one level higher?

          This always confused me too. I agree it should be one level higher.

          why isn't this giving me multiple collations

          This is a bug. See SOLR-2853.

          why aren't multiple suggestions returned in misspellingsAndCorrections? (and what's the purpose ...?)

          This is nested with the Collation and gives details, for that particular collation, which misspelled word got which replacement. This makes it easy for clients to generate messages like "no results found for abcdefgq ... Showing abcdefgx instead!" You can suppress this information by not specifying "spellcheck.collateExtendedResults=true". For users (like me) who are interested in the collations only and don't care about individual-word corrections, it would be nice if we could suppress the first section of the response entirely.

          I briefly tried distributed search...

          DistributedSpellCheckComponentTest is supposed to detect problems like this but maybe something is going on and there is a bug this test isn't catching?

          For what its worth you had voiced some misgivings about the JSON format when the multiple-collations feature was added. At that time I supplied a quick patch to address your concerns. I'm not sure if that patch fixes the problem described here. See SOLR-2010 and your comment from Oct 16, 2010 and the (now outdated, never committed) patch I supplied on Oct 20.

          The patch on this issue causes multiple test failures although I didn't look into them.

          Show
          jdyer James Dyer added a comment - Yonik, I can answer some of your questions. I do agree the spellcheck response format leaves something to be desired and maybe 4.0 is a good time to break backwards and improve it. Unless order is really important, "suggestions" should be a map I don't see why order would matter here, although some users might like to see the corrections listed in the order they appeared in the query. same for "collation" The collations are ranked, so order is important. and "misspellingsAndCorrections" The order shouldn't matter unless users are picky about the corrections being presented in the order they occur in the query. why is "collation" inside "suggestions" along with other words? should this be one level higher? This always confused me too. I agree it should be one level higher. why isn't this giving me multiple collations This is a bug. See SOLR-2853 . why aren't multiple suggestions returned in misspellingsAndCorrections? (and what's the purpose ...?) This is nested with the Collation and gives details, for that particular collation, which misspelled word got which replacement. This makes it easy for clients to generate messages like "no results found for abcdefgq ... Showing abcdefgx instead!" You can suppress this information by not specifying "spellcheck.collateExtendedResults=true". For users (like me) who are interested in the collations only and don't care about individual-word corrections, it would be nice if we could suppress the first section of the response entirely. I briefly tried distributed search... DistributedSpellCheckComponentTest is supposed to detect problems like this but maybe something is going on and there is a bug this test isn't catching? For what its worth you had voiced some misgivings about the JSON format when the multiple-collations feature was added. At that time I supplied a quick patch to address your concerns. I'm not sure if that patch fixes the problem described here. See SOLR-2010 and your comment from Oct 16, 2010 and the (now outdated, never committed) patch I supplied on Oct 20. The patch on this issue causes multiple test failures although I didn't look into them.
          Hide
          yseeley@gmail.com Yonik Seeley added a comment -

          Thanks for the explanations! (and yeah, I had lost track of the older issues...)

          The collations are ranked, so order is important.

          I guess I meant for a single collation:

          "collation",[
                  "collationQuery","abcdefgx solra",
                  "hits",0,
                  "misspellingsAndCorrections",[...
          

          It seems like that should be a map?

          {"collationQuery" : "abcdefgx solra",
           "hits" : 0
           "misspellingsAndCorrections" : {...
          }
          

          And if there are multiple collations, there should be an array of those maps?

          I do agree the spellcheck response format leaves something to be desired and maybe 4.0 is a good time to break backwards and improve it.

          +1, go for it!

          Show
          yseeley@gmail.com Yonik Seeley added a comment - Thanks for the explanations! (and yeah, I had lost track of the older issues...) The collations are ranked, so order is important. I guess I meant for a single collation: "collation" ,[ "collationQuery" , "abcdefgx solra" , "hits" ,0, "misspellingsAndCorrections" ,[... It seems like that should be a map? { "collationQuery" : "abcdefgx solra" , "hits" : 0 "misspellingsAndCorrections" : {... } And if there are multiple collations, there should be an array of those maps? I do agree the spellcheck response format leaves something to be desired and maybe 4.0 is a good time to break backwards and improve it. +1, go for it!
          Hide
          dbejean Dominique Béjean added a comment - - edited

          What about this issue ? It looks like nothing append during the last year.
          Here is a sample of an invalid json response for the collation with Solr 4.1

          "spellcheck":{
              "suggestions":{
                ...
                "correctlySpelled":false,
                "collation":"lang  louis",
                "collation":"long  loic",
                "collation":"land  loic"}},
                ...
          

          Collation should be an array

          ...
              "collation": [
                  "lang  louis",
                  "long  loic",
                  "land  loic"
              ]
          ...
          

          +1 !

          Show
          dbejean Dominique Béjean added a comment - - edited What about this issue ? It looks like nothing append during the last year. Here is a sample of an invalid json response for the collation with Solr 4.1 "spellcheck" :{     "suggestions" :{      ...       "correctlySpelled" : false ,       "collation" : "lang  louis" ,       "collation" : " long  loic" ,       "collation" : "land  loic" }}, ... Collation should be an array ... "collation" : [ "lang  louis" , " long  loic" , "land  loic" ] ... +1 !
          Hide
          huggy06 Jérôme Termes added a comment -

          Same for me
          +1, go for it!

          Show
          huggy06 Jérôme Termes added a comment - Same for me +1, go for it!
          Hide
          steve_rowe Steve Rowe added a comment -

          Bulk move 4.4 issues to 4.5 and 5.0

          Show
          steve_rowe Steve Rowe added a comment - Bulk move 4.4 issues to 4.5 and 5.0
          Hide
          thetaphi Uwe Schindler added a comment -

          Move issue to Solr 4.9.

          Show
          thetaphi Uwe Schindler added a comment - Move issue to Solr 4.9.
          Hide
          nalinikartha Nalini Kartha added a comment - - edited

          New patch taking into account comments on the JIRA -

          • Moved correctlySpelled boolean and collation list out of suggestions. Left suggestions and collations as NamedLists based on previous comments (maybe suggestions can also be made a SimpleOrderedMap though?).
          • Fixed all spellcheck tests to be compatible with the new response format.
          • Changed QueryResponse._spellInfo to a NamedList<Object> instead NamedList<NamedList<Object>> since correctlySpelled is at top level now.
          • Fixed DistributedSpellCheckComponentTest which started failing after this change - it was asserting that 'suggestions' was not empty for the collate test but no suggestions were actually being generated (was passing because correctlySpelled flag was in the 'suggestions' NamedList). Added a new term to the index which is within 2 edits of the query term ('quicker' and 'quick' respectively).

          cc James Dyer

          Show
          nalinikartha Nalini Kartha added a comment - - edited New patch taking into account comments on the JIRA - Moved correctlySpelled boolean and collation list out of suggestions. Left suggestions and collations as NamedLists based on previous comments (maybe suggestions can also be made a SimpleOrderedMap though?). Fixed all spellcheck tests to be compatible with the new response format. Changed QueryResponse._spellInfo to a NamedList<Object> instead NamedList<NamedList<Object>> since correctlySpelled is at top level now. Fixed DistributedSpellCheckComponentTest which started failing after this change - it was asserting that 'suggestions' was not empty for the collate test but no suggestions were actually being generated (was passing because correctlySpelled flag was in the 'suggestions' NamedList). Added a new term to the index which is within 2 edits of the query term ('quicker' and 'quick' respectively). cc James Dyer
          Hide
          jdyer James Dyer added a comment -

          I would think a way forward is to break backwards in the next release with an option you can set in the config that reverts it to the old format. We can remove the old format entirely in 5.0. Agree?

          Show
          jdyer James Dyer added a comment - I would think a way forward is to break backwards in the next release with an option you can set in the config that reverts it to the old format. We can remove the old format entirely in 5.0. Agree?
          Hide
          nalinikartha Nalini Kartha added a comment -

          Sorry, I should have mentioned that the patch provided is for trunk (figured that was safer since it is a backwards incompatible change). I can also provide a patch for branch_4x that adds a config option to revert to the old format.

          Show
          nalinikartha Nalini Kartha added a comment - Sorry, I should have mentioned that the patch provided is for trunk (figured that was safer since it is a backwards incompatible change). I can also provide a patch for branch_4x that adds a config option to revert to the old format.
          Hide
          jdyer James Dyer added a comment -

          its pretty easy to merge trunk changes back to 4.x. If we go this route, though, its not a straight merge as we'd need to provide a switch to let users have it the old way too. We can do 5.0-only also, nothing wrong with that imo, and easier.

          Show
          jdyer James Dyer added a comment - its pretty easy to merge trunk changes back to 4.x. If we go this route, though, its not a straight merge as we'd need to provide a switch to let users have it the old way too. We can do 5.0-only also, nothing wrong with that imo, and easier.
          Hide
          nalinikartha Nalini Kartha added a comment -

          Fixing in 5.0 sounds good to me too. I guess the next step is for one of the committers to review and then checkin the changes? Let me know if there's anything in the patch that needs fixing/improving.

          Show
          nalinikartha Nalini Kartha added a comment - Fixing in 5.0 sounds good to me too. I guess the next step is for one of the committers to review and then checkin the changes? Let me know if there's anything in the patch that needs fixing/improving.
          Hide
          nalinikartha Nalini Kartha added a comment -

          Hi James Dyer,

          Sorry to bug you but just wanted to check if you had a chance to look at the patch?

          Would appreciate your feedback. Thanks!

          Show
          nalinikartha Nalini Kartha added a comment - Hi James Dyer , Sorry to bug you but just wanted to check if you had a chance to look at the patch? Would appreciate your feedback. Thanks!
          Hide
          jdyer James Dyer added a comment -

          Here is a svn-style version of Nalini's patch, with 1 additional test fix.

          Show
          jdyer James Dyer added a comment - Here is a svn-style version of Nalini's patch, with 1 additional test fix.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1617572 from jdyer@apache.org in branch 'dev/trunk'
          [ https://svn.apache.org/r1617572 ]

          SOLR-3029: Spellcheck response format changes

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1617572 from jdyer@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1617572 ] SOLR-3029 : Spellcheck response format changes
          Hide
          jdyer James Dyer added a comment -

          Committed to Trunk & added information about the response format change in CHANGES.txt for 5.0.

          Thanks, Nalini!

          Show
          jdyer James Dyer added a comment - Committed to Trunk & added information about the response format change in CHANGES.txt for 5.0. Thanks, Nalini!
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1617602 from Erik Hatcher in branch 'dev/trunk'
          [ https://svn.apache.org/r1617602 ]

          SOLR-3029: adjust /browse did-you-mean output to new collation response format

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1617602 from Erik Hatcher in branch 'dev/trunk' [ https://svn.apache.org/r1617602 ] SOLR-3029 : adjust /browse did-you-mean output to new collation response format
          Hide
          anshumg Anshum Gupta added a comment -

          Bulk close after 5.0 release.

          Show
          anshumg Anshum Gupta added a comment - Bulk close after 5.0 release.

            People

            • Assignee:
              jdyer James Dyer
              Reporter:
              astubbs Antony Stubbs
            • Votes:
              3 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development