Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-10522

Duplicate keys in "collations" object with JSON response format

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 6.5
    • Fix Version/s: 6.6
    • Component/s: spellchecker
    • Security Level: Public (Default Security Level. Issues are Public)
    • Labels:
      None

      Description

      After upgrading Solr 6.3 -> 6.5 I've noticed a change in how json response writer outputs "collations" response key when spellchecking is enabled (wt=json&json.nl=arrarr)

      Solr 6.3:

      "collations":
      [
      ["collation",

      { "collationQuery":"the", "hits":48, "maxScore":"30.282", "misspellingsAndCorrections": [ ["thea","the"]]}

      ],
      ["collation",

      { "collationQuery":"tea", "hits":3, "maxScore":"2.936", "misspellingsAndCorrections": [ ["thea","tea"]]}

      ],
      ...

      Solr 6.5:

      "collations":{
      "collation":

      { "collationQuery":"the", "hits":43, "misspellingsAndCorrections": [ ["thea","the"]]}

      ,
      "collation":

      { "collationQuery":"tea", "hits":3, "misspellingsAndCorrections": [ ["thea","tea"]]}

      ,
      ...

      Solr 6.5 outputs object instead of an array, and it has duplicate keys which is not valid for JSON format.

      Any help is appreciated.

        Issue Links

          Activity

          Hide
          Eng1neer Nikita Pchelintsev added a comment -

          It seems that https://issues.apache.org/jira/browse/SOLR-9972 introduced this change

          Show
          Eng1neer Nikita Pchelintsev added a comment - It seems that https://issues.apache.org/jira/browse/SOLR-9972 introduced this change
          Hide
          elyograg Shawn Heisey added a comment -

          I haven't looked at the code, but if I had to guess, I would imagine that the internal representation is NamedList, which doesn't have a problem with multiple mappings where the key is the same.

          Do we need to deal with backward compatiblity at all, or do we consider the new format to be completely broken? Were XML and Javabin responses also affected by SOLR-9972?

          Show
          elyograg Shawn Heisey added a comment - I haven't looked at the code, but if I had to guess, I would imagine that the internal representation is NamedList, which doesn't have a problem with multiple mappings where the key is the same. Do we need to deal with backward compatiblity at all, or do we consider the new format to be completely broken? Were XML and Javabin responses also affected by SOLR-9972 ?
          Hide
          zoran Zoran Avtarovski added a comment -

          The XML response works because of the way XML is processed. But the tags are still incorrectly identified as lst as opposed to the correct arr.

          As you point out changing from a NamedList to an ArrayList will do the trick. The changes need to occur on the org.apache.solr.handler.component.SpellCheckComponent.java object at lines : 302, 437 and 516

          If you change the type to ArrayList and the associated add element code the issue appears to be resolved (with my limited testing)

          Show
          zoran Zoran Avtarovski added a comment - The XML response works because of the way XML is processed. But the tags are still incorrectly identified as lst as opposed to the correct arr. As you point out changing from a NamedList to an ArrayList will do the trick. The changes need to occur on the org.apache.solr.handler.component.SpellCheckComponent.java object at lines : 302, 437 and 516 If you change the type to ArrayList and the associated add element code the issue appears to be resolved (with my limited testing)
          Hide
          jdyer James Dyer added a comment -

          We might need to re-think our work with SOLR-9972. My apologizes Christine Poerschke in that when I reviewed SOLR-9972, I hadn't realized we had more than 1 json format and that fixing one might break the others.

          prior to SOLR-9972, our "flat" (default) json looked like this for the collation section:

          "collations":[
              "collation",{
                  "collationQuery":"lowerfilt:(+faith +hope +loaves)",
                  "hits":1,
                  "misspellingsAndCorrections":[
                  "fauth","faith",
                  "home","hope",
                  "loane","loaves"]},
              "collation",{
                  "collationQuery":"lowerfilt:(+faith +hope +love)",
                  "hits":1,
                  "misspellingsAndCorrections":[
                  "fauth","faith",
                  "home","hope",
                  "loane","love"]}]
          

          ...by having "collations" as a NamedList, we avoid having duplicate keys with "collation". But the "arrntv" format chokes around the "collationQuery":

          "collations":
          [
              {"name":"collation",{
              "type":"str","value":"collationQuery":"lowerfilt:(+faith +hope +loaves)",
              "hits":1,
              "misspellingsAndCorrections":
              [
                  {"name":"fauth","type":"str","value":"faith"},
                  {"name":"home","type":"str","value":"hope"},
                  {"name":"loane","type":"str","value":"loaves"}]}},
              {"name":"collation",{
              "type":"str","value":"collationQuery":"lowerfilt:(+faith +hope +love)",
              "hits":1,
              "misspellingsAndCorrections":
              [
                  {"name":"fauth","type":"str","value":"faith"},
                  {"name":"home","type":"str","value":"hope"},
                  {"name":"loane","type":"str","value":"love"}]}}]
          

          ...So SOLR-9972 changed "collations" to be a SimpleOrderedMap. Now we get this for "arrntv":

          "collations":{
              "collation":{
              "collationQuery":"lowerfilt:(+faith +hope +loaves)",
              "hits":1,
              "misspellingsAndCorrections":
              [
                  {"name":"fauth","type":"str","value":"faith"},
                  {"name":"home","type":"str","value":"hope"},
                  {"name":"loane","type":"str","value":"loaves"}]},
              "collation":{
              "collationQuery":"lowerfilt:(+faith +hope +love)",
              "hits":1,
              "misspellingsAndCorrections":
              [
                  {"name":"fauth","type":"str","value":"faith"},
                  {"name":"home","type":"str","value":"hope"},
                  {"name":"loane","type":"str","value":"love"}]}}
          

          ...so now it renders valid json. But under "collations", we have duplicate keys, right? If there is more than 1 collation, the "collation" key keeps getting overwritten.

          So then, it seems that SOLR-9972 is only a partial fix for "arrntv" because while we have valid json, there are duplicate keys. But worse, SOLR-9972 broke the default json format, both from a backwards-compatibility standpoint, and also from a correctness standpoint as this is also subject to duplicate keys.

          I'd think reverting SOLR-9972 would leave us in a better situation than the current one. But can someone suggest a solution that would result in:

          • valid json for all the various json formats we support
          • no duplicate keys when there are multiple collations
          • no breaking backwards compatibility until 7.0, except for the completely-broken "arrntv" case ? (6.5 changes notwithstanding, breaking backwards here was a bug in my opinion).

          ??

          Show
          jdyer James Dyer added a comment - We might need to re-think our work with SOLR-9972 . My apologizes Christine Poerschke in that when I reviewed SOLR-9972 , I hadn't realized we had more than 1 json format and that fixing one might break the others. prior to SOLR-9972 , our "flat" (default) json looked like this for the collation section: "collations":[ "collation",{ "collationQuery":"lowerfilt:(+faith +hope +loaves)", "hits":1, "misspellingsAndCorrections":[ "fauth","faith", "home","hope", "loane","loaves"]}, "collation",{ "collationQuery":"lowerfilt:(+faith +hope +love)", "hits":1, "misspellingsAndCorrections":[ "fauth","faith", "home","hope", "loane","love"]}] ...by having "collations" as a NamedList, we avoid having duplicate keys with "collation". But the "arrntv" format chokes around the "collationQuery": "collations": [ {"name":"collation",{ "type":"str","value":"collationQuery":"lowerfilt:(+faith +hope +loaves)", "hits":1, "misspellingsAndCorrections": [ {"name":"fauth","type":"str","value":"faith"}, {"name":"home","type":"str","value":"hope"}, {"name":"loane","type":"str","value":"loaves"}]}}, {"name":"collation",{ "type":"str","value":"collationQuery":"lowerfilt:(+faith +hope +love)", "hits":1, "misspellingsAndCorrections": [ {"name":"fauth","type":"str","value":"faith"}, {"name":"home","type":"str","value":"hope"}, {"name":"loane","type":"str","value":"love"}]}}] ...So SOLR-9972 changed "collations" to be a SimpleOrderedMap. Now we get this for "arrntv": "collations":{ "collation":{ "collationQuery":"lowerfilt:(+faith +hope +loaves)", "hits":1, "misspellingsAndCorrections": [ {"name":"fauth","type":"str","value":"faith"}, {"name":"home","type":"str","value":"hope"}, {"name":"loane","type":"str","value":"loaves"}]}, "collation":{ "collationQuery":"lowerfilt:(+faith +hope +love)", "hits":1, "misspellingsAndCorrections": [ {"name":"fauth","type":"str","value":"faith"}, {"name":"home","type":"str","value":"hope"}, {"name":"loane","type":"str","value":"love"}]}} ...so now it renders valid json. But under "collations", we have duplicate keys, right? If there is more than 1 collation, the "collation" key keeps getting overwritten. So then, it seems that SOLR-9972 is only a partial fix for "arrntv" because while we have valid json, there are duplicate keys. But worse, SOLR-9972 broke the default json format, both from a backwards-compatibility standpoint, and also from a correctness standpoint as this is also subject to duplicate keys. I'd think reverting SOLR-9972 would leave us in a better situation than the current one. But can someone suggest a solution that would result in: valid json for all the various json formats we support no duplicate keys when there are multiple collations no breaking backwards compatibility until 7.0, except for the completely-broken "arrntv" case ? (6.5 changes notwithstanding, breaking backwards here was a bug in my opinion). ??
          Hide
          zoran Zoran Avtarovski added a comment -

          I thought about this when I ran into the issue and one solution which might work and still accomodate legacy use is to name the first collation key "collation" and then append a numeric value to each subsequent key. eg
          "collations":[
          "collation",

          {...},
          "collation1",{...}

          ,
          "collation2",

          {...},
          "collation3",{...}

          ,
          "collation4",

          {...}

          ]

          This way, legacy applications looking for the "collation" key would be fine but the json format would also be valid and enable parsers to parse through all the collations.

          Just a suggestion.

          Show
          zoran Zoran Avtarovski added a comment - I thought about this when I ran into the issue and one solution which might work and still accomodate legacy use is to name the first collation key "collation" and then append a numeric value to each subsequent key. eg "collations":[ "collation", {...}, "collation1",{...} , "collation2", {...}, "collation3",{...} , "collation4", {...} ] This way, legacy applications looking for the "collation" key would be fine but the json format would also be valid and enable parsers to parse through all the collations. Just a suggestion.
          Hide
          cpoerschke Christine Poerschke added a comment -

          Hi. A little late joining the conversation here.

          So SOLR-9972 intended to fix JSON output for json.nl=arrntv and other things got inadvertently broken in the process, oops. Might one way forward be to do the following three things:
          1. revert the SOLR-9972 change
          2. add logic (in SpellCheckComponent) to reject json.nl=arrntv as not-yet-supported
          3. from 7.0 onwards support json.nl=arrntv for the SpellCheckComponent (and this may entail non-backcompat changes)

          Show
          cpoerschke Christine Poerschke added a comment - Hi. A little late joining the conversation here. So SOLR-9972 intended to fix JSON output for json.nl=arrntv and other things got inadvertently broken in the process, oops. Might one way forward be to do the following three things: 1. revert the SOLR-9972 change 2. add logic (in SpellCheckComponent) to reject json.nl=arrntv as not-yet-supported 3. from 7.0 onwards support json.nl=arrntv for the SpellCheckComponent (and this may entail non-backcompat changes)
          Hide
          jdyer James Dyer added a comment -

          I do not think SpellCheckComponent (or any other part of Solr) should have to do anything special to cater to the various json formats we now support. In my opinion, the response writer should never output invalid json, and it should be its responsibility to complain when it encounters something it cannot support. If it was fixed centrally, it would make the output more consistent as similar scenarios would be handled the same way throughout the application. This would also be a bit more future-proof because it isn't exactly clear that the Solr objects that hold response data can product invalid responses, and it doesn't seem likely every new output would be tested against all of these formats.

          I think a better move forward would be:
          1. revert SOLR-9972.
          2. open a ticket that arrntv can sometimes product invalid json, using this as an example use-case. Having it reject unsupportable objects would be a valid first step.
          3. Maybe in 7.x revamp the response objects to be more response-writer-friendly, less xml-centric ?

          Show
          jdyer James Dyer added a comment - I do not think SpellCheckComponent (or any other part of Solr) should have to do anything special to cater to the various json formats we now support. In my opinion, the response writer should never output invalid json, and it should be its responsibility to complain when it encounters something it cannot support. If it was fixed centrally, it would make the output more consistent as similar scenarios would be handled the same way throughout the application. This would also be a bit more future-proof because it isn't exactly clear that the Solr objects that hold response data can product invalid responses, and it doesn't seem likely every new output would be tested against all of these formats. I think a better move forward would be: 1. revert SOLR-9972 . 2. open a ticket that arrntv can sometimes product invalid json, using this as an example use-case. Having it reject unsupportable objects would be a valid first step. 3. Maybe in 7.x revamp the response objects to be more response-writer-friendly, less xml-centric ?
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 571264bcc007a3d853f678dd7ac5b529644e938d in lucene-solr's branch refs/heads/master from jdyer1
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=571264b ]

          SOLR-10522: Revert "SOLR-9972: SpellCheckComponent collations and suggestions returned as a JSON object rather than a list"

          This reverts commit 4cd3d15da8ef77ef50e2bda91ed6d3c6e87b5426.

          Show
          jira-bot ASF subversion and git services added a comment - Commit 571264bcc007a3d853f678dd7ac5b529644e938d in lucene-solr's branch refs/heads/master from jdyer1 [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=571264b ] SOLR-10522 : Revert " SOLR-9972 : SpellCheckComponent collations and suggestions returned as a JSON object rather than a list" This reverts commit 4cd3d15da8ef77ef50e2bda91ed6d3c6e87b5426.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 82b78bd95e95b95b9e7527d9396be24a0dc221df in lucene-solr's branch refs/heads/master from jdyer1
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=82b78bd ]

          SOLR-10522: Revert "SOLR-9972: SpellCheckComponent collations and suggestions returned as a JSON object rather than a list"

          Show
          jira-bot ASF subversion and git services added a comment - Commit 82b78bd95e95b95b9e7527d9396be24a0dc221df in lucene-solr's branch refs/heads/master from jdyer1 [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=82b78bd ] SOLR-10522 : Revert " SOLR-9972 : SpellCheckComponent collations and suggestions returned as a JSON object rather than a list"
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 68f51e6a128e0d56ae37b070834bb36e00ceca50 in lucene-solr's branch refs/heads/branch_6x from jdyer1
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=68f51e6 ]

          SOLR-10522: Revert "SOLR-9972: SpellCheckComponent collations and suggestions returned as a JSON object rather than a list"

          This reverts commit 4cd3d15da8ef77ef50e2bda91ed6d3c6e87b5426.

          Show
          jira-bot ASF subversion and git services added a comment - Commit 68f51e6a128e0d56ae37b070834bb36e00ceca50 in lucene-solr's branch refs/heads/branch_6x from jdyer1 [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=68f51e6 ] SOLR-10522 : Revert " SOLR-9972 : SpellCheckComponent collations and suggestions returned as a JSON object rather than a list" This reverts commit 4cd3d15da8ef77ef50e2bda91ed6d3c6e87b5426.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 854347d5e09ea92f1d738458a05fc2224ca3d7aa in lucene-solr's branch refs/heads/branch_6x from jdyer1
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=854347d ]

          SOLR-10522: Revert "SOLR-9972: SpellCheckComponent collations and suggestions returned as a JSON object rather than a list"

          Show
          jira-bot ASF subversion and git services added a comment - Commit 854347d5e09ea92f1d738458a05fc2224ca3d7aa in lucene-solr's branch refs/heads/branch_6x from jdyer1 [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=854347d ] SOLR-10522 : Revert " SOLR-9972 : SpellCheckComponent collations and suggestions returned as a JSON object rather than a list"
          Hide
          jdyer James Dyer added a comment -

          fixed for 6.6 by reverting SOLR-9972.

          Show
          jdyer James Dyer added a comment - fixed for 6.6 by reverting SOLR-9972 .

            People

            • Assignee:
              jdyer James Dyer
              Reporter:
              Eng1neer Nikita Pchelintsev
            • Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development