Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-9442

Add json.nl=arrnvp (array of NamedValuePair) style in JSONResponseWriter

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.4, 7.0
    • Component/s: Response Writers
    • Labels:
      None
    • Flags:
      Patch

      Description

      The JSONResponseWriter class currently supports several styles of NamedList output format, documented on the wiki at http://wiki.apache.org/solr/SolJSON and in the code at https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/response/JSONResponseWriter.java#L71-L76.

      For example the 'arrmap' style:

      NamedList("a"=1,"b"=2,null=3) => [{"a":1},{"b":2},3]
      NamedList("a"=1,"bar”=“foo",null=3.4f) => [{"a":1},{"bar”:”foo"},{3.4}]

      This patch creates a new style ‘arrnvp’ which is an array of named value pairs. For example:

      NamedList("a"=1,"b"=2,null=3) => [{"name":"a","int":1},{"name":"b","int":2},{"int":3}]
      NamedList("a"=1,"bar”=“foo",null=3.4f) => [{"name":"a","int":1},{"name":"b","str":"foo"},{"float":3.4}]

      This style maintains the type information of the values, similar to the xml format:

      <lst name=“someField”>
            <int name=“a”>1</int>
            <str name=“bar”>foo</str>
            <float>3.4</float>
      </lst>
      1. SOLR-9442.patch
        14 kB
        Christine Poerschke
      2. SOLR-9442.patch
        13 kB
        Jonny Marks
      3. SOLR-9442.patch
        3 kB
        Jonny Marks
      4. SOLR-9442-arrntv.patch
        9 kB
        Christine Poerschke

        Issue Links

          Activity

          Hide
          cpoerschke Christine Poerschke added a comment -

          +1 to the idea and the partial patch so far.

          Show
          cpoerschke Christine Poerschke added a comment - +1 to the idea and the partial patch so far.
          Hide
          jm100 Jonny Marks added a comment -

          Attaching full patch

          Show
          jm100 Jonny Marks added a comment - Attaching full patch
          Hide
          cpoerschke Christine Poerschke added a comment -

          Hi Jonny, thanks for the full patch.

          Please find attached a revised version of your patch with the following changes:

          • reduced duplication in new code: factored out ifNeededWriteTypeAsKey method
          • code clarity: couple of renames, extra comments, style tweaks
          • extended existing test: added JSONWriter.JSON_NL_ARROFNVP to testConstantsUnchanged
          • strengthened new test: removal of any of the 'methodsExpectedNotOverriden' methods will now cause a test failure
          • added new test: testArrnvpWriterLacksMethodsOfItsOwn

          Let me know what you think. If there are no further suggestions, comments, questions, etc. then i would proceed with committing early next week.

          Show
          cpoerschke Christine Poerschke added a comment - Hi Jonny, thanks for the full patch. Please find attached a revised version of your patch with the following changes: reduced duplication in new code: factored out ifNeededWriteTypeAsKey method code clarity: couple of renames, extra comments, style tweaks extended existing test: added JSONWriter.JSON_NL_ARROFNVP to testConstantsUnchanged strengthened new test: removal of any of the 'methodsExpectedNotOverriden' methods will now cause a test failure added new test: testArrnvpWriterLacksMethodsOfItsOwn Let me know what you think. If there are no further suggestions, comments, questions, etc. then i would proceed with committing early next week.
          Hide
          jm100 Jonny Marks added a comment -

          Thanks Christine, the changes are fine.

          Show
          jm100 Jonny Marks added a comment - Thanks Christine, the changes are fine.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 87c6ec4cb0a91e1952e4dff31d6e1f92ed0806bf in lucene-solr's branch refs/heads/master from Christine Poerschke
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=87c6ec4 ]

          SOLR-9442: Adds Array of NamedValuePair (json.nl=arrnvp) style to JSONResponseWriter. (Jonny Marks, Christine Poerschke)

          Show
          jira-bot ASF subversion and git services added a comment - Commit 87c6ec4cb0a91e1952e4dff31d6e1f92ed0806bf in lucene-solr's branch refs/heads/master from Christine Poerschke [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=87c6ec4 ] SOLR-9442 : Adds Array of NamedValuePair (json.nl=arrnvp) style to JSONResponseWriter. (Jonny Marks, Christine Poerschke)
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 9b424454e381c4fb44ec5702a7351bac99c01ffe in lucene-solr's branch refs/heads/branch_6x from Christine Poerschke
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9b42445 ]

          SOLR-9442: Adds Array of NamedValuePair (json.nl=arrnvp) style to JSONResponseWriter. (Jonny Marks, Christine Poerschke)

          Show
          jira-bot ASF subversion and git services added a comment - Commit 9b424454e381c4fb44ec5702a7351bac99c01ffe in lucene-solr's branch refs/heads/branch_6x from Christine Poerschke [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9b42445 ] SOLR-9442 : Adds Array of NamedValuePair (json.nl=arrnvp) style to JSONResponseWriter. (Jonny Marks, Christine Poerschke)
          Hide
          cpoerschke Christine Poerschke added a comment -

          Patch committed to master and cherry-picked to branch_6x. Created SOLR-9709 sub-task for wiki update.

          Show
          cpoerschke Christine Poerschke added a comment - Patch committed to master and cherry-picked to branch_6x. Created SOLR-9709 sub-task for wiki update.
          Hide
          hossman Hoss Man added a comment -

          I know i'm a little late to the party, but this new format seems overly cumbersome to parse – for a given "NamedValuePair" the only constant is that if the JSON "name" attribute exists, then it's value will be a String telling you the name of this Pair – if it does not exist, then you know this pair has no name. But in order to know either the type or the value of the Pair you have to either know beforehand what to expect, or iterate – either over the possibly type values ("int","str",etc...), or over the list of attribute keys in the Pair (ignoring "name").

          It seems like a better structure (that would still maintain the original goal of parity with the amount of data returned in the XML format) would be..

          NamedList("a"=1,"b"=2,null=3) 
            => [{"name":"a",   "type":"int",    "value":1},
                {"name":"b",   "type":"int",    "value":2},
                {"name":null,  "type":"int",    "value":3}]
          
          NamedList("a"=1,"bar”=“foo",null=3.4f)
            => [{"name":"a",   "type":"int",    "value":1},
                {"name":"b",   "type":"str",    "value":"foo"},
                {"name":null,  "type":"float",  "value":3.4}]
          
          NamedList("bar”=null,null=true)
            => [{"name":"bar", "type":"null",   "value":null},
                {"name":null,  "type":"bool",   "value":true}]
          

          ...the key improvement (in my mind) being: The set of JSON attributes is fixed for all Pairs: "name", "type", and "value". Clients can always look for those 3 attributes to learn everything they can learn from the equivalent XML output.

          Christine Poerschke & Jonny Marks - what do you think? better then what we have now?

          Show
          hossman Hoss Man added a comment - I know i'm a little late to the party, but this new format seems overly cumbersome to parse – for a given "NamedValuePair" the only constant is that if the JSON "name" attribute exists, then it's value will be a String telling you the name of this Pair – if it does not exist, then you know this pair has no name. But in order to know either the type or the value of the Pair you have to either know beforehand what to expect, or iterate – either over the possibly type values ("int","str",etc...), or over the list of attribute keys in the Pair (ignoring "name"). It seems like a better structure (that would still maintain the original goal of parity with the amount of data returned in the XML format) would be.. NamedList("a"=1,"b"=2,null=3) => [{"name":"a", "type":"int", "value":1}, {"name":"b", "type":"int", "value":2}, {"name":null, "type":"int", "value":3}] NamedList("a"=1,"bar”=“foo",null=3.4f) => [{"name":"a", "type":"int", "value":1}, {"name":"b", "type":"str", "value":"foo"}, {"name":null, "type":"float", "value":3.4}] NamedList("bar”=null,null=true) => [{"name":"bar", "type":"null", "value":null}, {"name":null, "type":"bool", "value":true}] ...the key improvement (in my mind) being: The set of JSON attributes is fixed for all Pairs: "name", "type", and "value". Clients can always look for those 3 attributes to learn everything they can learn from the equivalent XML output. Christine Poerschke & Jonny Marks - what do you think? better then what we have now?
          Hide
          cpoerschke Christine Poerschke added a comment -

          Hi Hoss Man, thanks for your input. Jonny and I discussed offline and here's what we think.

          ... you have to either know beforehand what to expect, or iterate ...

          Yes, with the json.nl=arrnvp {"name":"a","int":1} representation it helps to know beforehand what to expect. Our (quite possibly unusual) use case is actually to parse-and-convert the JSON response into an object validated by a schema.

          We agree that a {"name":"a", "type":"int", "value":1} representation would help avoid iterating since clients can rely on the existence of the exact 3 attributes.

          How about supporting both json.nl=arrnvp (Named Value Pair) and json.nl=arrntv (Name Type Value) representations? Attached SOLR-9442-arrntv.patch shows how both can easily share most code. What do you think?

          Show
          cpoerschke Christine Poerschke added a comment - Hi Hoss Man , thanks for your input. Jonny and I discussed offline and here's what we think. ... you have to either know beforehand what to expect, or iterate ... Yes, with the json.nl=arrnvp {"name":"a","int":1} representation it helps to know beforehand what to expect. Our (quite possibly unusual) use case is actually to parse-and-convert the JSON response into an object validated by a schema . We agree that a {"name":"a", "type":"int", "value":1} representation would help avoid iterating since clients can rely on the existence of the exact 3 attributes. How about supporting both json.nl=arrnvp (Named Value Pair) and json.nl=arrntv (Name Type Value) representations? Attached SOLR-9442 -arrntv.patch shows how both can easily share most code. What do you think?
          Hide
          hossman Hoss Man added a comment -

          What do you think?

          Well, I guess don't really see what usecases the "arrnvp" format (already committed) serves that aren't equally/better served with the "arrntv" format ... if you find it more useful then what I suggested then i guess it has a purpose as well ... i'm just not seeing it.

          I suppose my main concern with having both is making sure the tests/docs make it very clear how things behave with either the name or the value (or both) are null.

          Even after reading the original diff for this issue it's not clear to me what JSON attribute(s) will exist for the equivalents of things like <null name="bar"/> or <null/> since there is no "type" to use as a JSON attribute name ... or does that just result in something like...

          NamedList("bar”=null,null=true,null=null)
            => [{"name":"bar"},
                {"bool":true},
                {}]
          

          ...which seems really weird)

          Show
          hossman Hoss Man added a comment - What do you think? Well, I guess don't really see what usecases the "arrnvp" format (already committed) serves that aren't equally/better served with the "arrntv" format ... if you find it more useful then what I suggested then i guess it has a purpose as well ... i'm just not seeing it. I suppose my main concern with having both is making sure the tests/docs make it very clear how things behave with either the name or the value (or both) are null. Even after reading the original diff for this issue it's not clear to me what JSON attribute(s) will exist for the equivalents of things like <null name="bar"/> or <null/> since there is no "type" to use as a JSON attribute name ... or does that just result in something like... NamedList("bar”=null,null=true,null=null) => [{"name":"bar"}, {"bool":true}, {}] ...which seems really weird)
          Hide
          cpoerschke Christine Poerschke added a comment -

          Well, I guess don't really see what usecases the "arrnvp" format (already committed) serves that aren't equally/better served with the "arrntv" format ... if you find it more useful then what I suggested then i guess it has a purpose as well ... i'm just not seeing it.

          The difference, as I see it, between "arrnvp" and "arrntv" format is very subtle, let's see if perhaps an example helps illustrate it.

          Both "arrnvp" result

          NamedList(foo=42) => [ { "name":"foo", "int":42 } ]
          

          and "arrntv" result

          NamedList(foo=42) => [ { "name":"foo", "type":"int", "value":42 } ]
          

          contain information equivalent to

          <lst name="someField">
            <int name=“foo”>42</int>
          </lst>
          

          xml format.

          In the "know beforehand what to expect" use case, let's say the expectation is "int" type and hence the client does

            myName = asString(someField, "name")
            myInt = asInt(someField, "int")
          

          for the "arrnvp" format.

          For the "arrntv" format

            # option 1
            myName = asString(someField, "name")
            myInt = asInt(someField, "value")
          

          means that the "type" attribute is totally ignored or alternatively

            # option 2
            myName = asString(someField, "name")
            assert "int" == asString(someField, "type")  
            myInt = asInt(someField, "value")
          

          means "int-ness" checks happens twice since theoretically "type" and "value" could mismatch.

          I suppose my main concern with having both is making sure the tests/docs make it very clear how things behave with either the name or the value (or both) are null.

          I agree the tests and docs could have been clearer (not just for "arrnvp") as far as null name and/or null value go. SOLR-9782 has now improved them.

          ... what JSON attribute(s) will exist for the equivalents of things like <null name="bar"/> or <null/> since there is no "type" to use as a JSON attribute name ...

          Taking your NamedList("bar”=null,null=true,null=null) example,
          with "arrnvp" the result is

          [
            { "name":"bar", "null":null },
            { "name":null,  "bool":true },
            { "name":null,  "null":null }
          ]
          

          and with "arrntv" the result will be

          [
            { "name":"bar", "type":"null", "value":null },
            { "name":null,  "type":"bool", "value":true },
            { "name":null,  "type":"null", "value":null }
          ]
          

          i.e. the "type" for null values is called "null".

          Show
          cpoerschke Christine Poerschke added a comment - Well, I guess don't really see what usecases the "arrnvp" format (already committed) serves that aren't equally/better served with the "arrntv" format ... if you find it more useful then what I suggested then i guess it has a purpose as well ... i'm just not seeing it. The difference, as I see it, between "arrnvp" and "arrntv" format is very subtle, let's see if perhaps an example helps illustrate it. Both "arrnvp" result NamedList(foo=42) => [ { "name" : "foo" , " int " :42 } ] and "arrntv" result NamedList(foo=42) => [ { "name" : "foo" , "type" : " int " , "value" :42 } ] contain information equivalent to <lst name= "someField" > < int name=“foo”>42</ int > </lst> xml format. In the "know beforehand what to expect" use case, let's say the expectation is "int" type and hence the client does myName = asString(someField, "name" ) myInt = asInt(someField, " int " ) for the "arrnvp" format. For the "arrntv" format # option 1 myName = asString(someField, "name" ) myInt = asInt(someField, "value" ) means that the "type" attribute is totally ignored or alternatively # option 2 myName = asString(someField, "name" ) assert " int " == asString(someField, "type" ) myInt = asInt(someField, "value" ) means "int-ness" checks happens twice since theoretically "type" and "value" could mismatch. I suppose my main concern with having both is making sure the tests/docs make it very clear how things behave with either the name or the value (or both) are null. I agree the tests and docs could have been clearer (not just for "arrnvp") as far as null name and/or null value go. SOLR-9782 has now improved them. ... what JSON attribute(s) will exist for the equivalents of things like <null name="bar"/> or <null/> since there is no "type" to use as a JSON attribute name ... Taking your NamedList("bar”=null,null=true,null=null) example, with "arrnvp" the result is [ { "name" : "bar" , " null " : null }, { "name" : null , "bool" : true }, { "name" : null , " null " : null } ] and with "arrntv" the result will be [ { "name" : "bar" , "type" : " null " , "value" : null }, { "name" : null , "type" : "bool" , "value" : true }, { "name" : null , "type" : " null " , "value" : null } ] i.e. the "type" for null values is called "null".
          Hide
          cpoerschke Christine Poerschke added a comment -

          Created separate SOLR-9787 ticket-with-patch for the addition of "arrntv" (array of Name Type Value) support.

          Show
          cpoerschke Christine Poerschke added a comment - Created separate SOLR-9787 ticket-with-patch for the addition of "arrntv" (array of Name Type Value) support.
          Hide
          hossman Hoss Man added a comment -

          ok, well ... my main point is pretty clearly illustrated by what you said here...

          Both "arrnvp" result
          ...
          and "arrntv" result
          ...
          contain information equivalent to

          <lst name="someField">
            <int name=“foo”>42</int>
          </lst>
          

          xml format.

          In the "know beforehand what to expect" use case, let's say the expectation is "int" type and hence the client does

            myName = asString(someField, "name")
            myInt = asInt(someField, "int")
          

          for the "arrnvp" format.

          For the "arrntv" format

            # option 1
            myName = asString(someField, "name")
            myInt = asInt(someField, "value")
          

          ...if you are 100% confident in what the datatypes are, then both formats are equally useful to you – the amount of code you have to write is exactly the same. but the arrntv alternative i suggested is also useful even if you don't know the types – to me arntv is strictly an improvement.

          If you want to split arrntv out in SOLR-9787 that's fine, but i still ultimately question the value of this arrnvp for end users if we also have arrntv? – if SOLR-9787 gets committed to branch_6x before arrnvp is ever included in an official release, what's the point of leaving arrnvp in Solr? why not remove it as part of SOLR-9787?


          As for the null handling, your explanation of how arrnvp currently works when the value is null demonstrats what seems like a big gotcha in the current code – namely that if the the client is expecting an String but the actual value is null (something that seems plausable and should be accounted for in the list of semi-typical cases), then a lookup for the key "str" in the JSONObject (ie: myValue = asString(someField, "str") in your psuedocode) is going to "fail", because there won't be a key named "str" instead there is a key named "null" ... maybe for some clients that "failure" will be fine, because the failed lookup will result in returning null in the client language, which might be what the client expects, but in other cases/languages it might throw an exception of cause other problems.

          with the arrntv variant, even if you ignore the type, the "value" key would still returning null and the client shouldn't suffer any sort of catastrophic/unexpected error.

          ...which again leaves me questioning: isn't the discussed arrntv idea strictly better then the rrnvp format currently committed? if SOLR-9787 gets committed to branch_6x before arrnvp is ever included in an official release, what's the point of leaving arrnvp in Solr? why not remove it as part of SOLR-9787?

          Show
          hossman Hoss Man added a comment - ok, well ... my main point is pretty clearly illustrated by what you said here... Both "arrnvp" result ... and "arrntv" result ... contain information equivalent to <lst name="someField"> <int name=“foo”>42</int> </lst> xml format. In the "know beforehand what to expect" use case, let's say the expectation is "int" type and hence the client does myName = asString(someField, "name" ) myInt = asInt(someField, " int " ) for the "arrnvp" format. For the "arrntv" format # option 1 myName = asString(someField, "name" ) myInt = asInt(someField, "value" ) ...if you are 100% confident in what the datatypes are, then both formats are equally useful to you – the amount of code you have to write is exactly the same. but the arrntv alternative i suggested is also useful even if you don't know the types – to me arntv is strictly an improvement. If you want to split arrntv out in SOLR-9787 that's fine, but i still ultimately question the value of this arrnvp for end users if we also have arrntv? – if SOLR-9787 gets committed to branch_6x before arrnvp is ever included in an official release, what's the point of leaving arrnvp in Solr? why not remove it as part of SOLR-9787 ? As for the null handling, your explanation of how arrnvp currently works when the value is null demonstrats what seems like a big gotcha in the current code – namely that if the the client is expecting an String but the actual value is null (something that seems plausable and should be accounted for in the list of semi-typical cases), then a lookup for the key "str" in the JSONObject (ie: myValue = asString(someField, "str") in your psuedocode) is going to "fail", because there won't be a key named "str" instead there is a key named "null" ... maybe for some clients that "failure" will be fine, because the failed lookup will result in returning null in the client language, which might be what the client expects, but in other cases/languages it might throw an exception of cause other problems. with the arrntv variant, even if you ignore the type, the "value" key would still returning null and the client shouldn't suffer any sort of catastrophic/unexpected error. ...which again leaves me questioning: isn't the discussed arrntv idea strictly better then the rrnvp format currently committed? if SOLR-9787 gets committed to branch_6x before arrnvp is ever included in an official release, what's the point of leaving arrnvp in Solr? why not remove it as part of SOLR-9787 ?
          Hide
          cpoerschke Christine Poerschke added a comment -

          As for the null handling ...

          Alright, let's proceed with "arrntv" as the replacement for "arrnvp" to avoid the subtle differences between the formats and also to not unnecessarily pose the "Should I use 'arrntv' or 'arrnvp'?" question for users.

          Show
          cpoerschke Christine Poerschke added a comment - As for the null handling ... Alright, let's proceed with "arrntv" as the replacement for "arrnvp" to avoid the subtle differences between the formats and also to not unnecessarily pose the "Should I use 'arrntv' or 'arrnvp'?" question for users.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit e4ef4239f1b23afb116868e8528f1cd947287bd9 in lucene-solr's branch refs/heads/master from Christine Poerschke
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e4ef423 ]

          SOLR-9787, SOLR-9442: Replace json.nl=arrnvp with json.nl=arrntv (array of Name Type Value) style in JSONResponseWriter

          Show
          jira-bot ASF subversion and git services added a comment - Commit e4ef4239f1b23afb116868e8528f1cd947287bd9 in lucene-solr's branch refs/heads/master from Christine Poerschke [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e4ef423 ] SOLR-9787 , SOLR-9442 : Replace json.nl=arrnvp with json.nl=arrntv (array of Name Type Value) style in JSONResponseWriter
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 73e50ceceb836421a176fe0fa2098a20b3c8a304 in lucene-solr's branch refs/heads/branch_6x from Christine Poerschke
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=73e50ce ]

          SOLR-9787, SOLR-9442: Replace json.nl=arrnvp with json.nl=arrntv (array of Name Type Value) style in JSONResponseWriter

          Show
          jira-bot ASF subversion and git services added a comment - Commit 73e50ceceb836421a176fe0fa2098a20b3c8a304 in lucene-solr's branch refs/heads/branch_6x from Christine Poerschke [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=73e50ce ] SOLR-9787 , SOLR-9442 : Replace json.nl=arrnvp with json.nl=arrntv (array of Name Type Value) style in JSONResponseWriter
          Show
          cpoerschke Christine Poerschke added a comment - https://cwiki.apache.org/confluence/display/solr/Response+Writers#ResponseWriters-JSONResponseWriter updated to include example for json.nl=arrntv

            People

            • Assignee:
              cpoerschke Christine Poerschke
              Reporter:
              jm100 Jonny Marks
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development