Solr
  1. Solr
  2. SOLR-6294

The JsonLoader should accept a single doc without wrapping in an array

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: update
    • Labels:
      None

      Description

      This is the multi document input command

      curl http://localhost:8983/solr/update/json -H 'Content-type:application/json' -d '
      [
       {"id" : "TestDoc1", "title" : "test1"},
      ]'
      

      The following also should be a valid update command for a single doc

      curl http://localhost:8983/solr/update/json -H 'Content-type:application/json' -d '
       {"id" : "TestDoc1", "title" : "test1"},
      '
      

        Activity

        Hide
        Hoss Man added a comment -

        that would make a sending a single json object ambiguous: does it represent a solr document to add, or does it represent a set of update commands?

        is this a single document containing 1 boolean field, or is this a command to do a commit?

        { "commit":true }
        
        Show
        Hoss Man added a comment - that would make a sending a single json object ambiguous: does it represent a solr document to add, or does it represent a set of update commands? is this a single document containing 1 boolean field, or is this a command to do a commit? { "commit" : true }
        Hide
        Noble Paul added a comment -

        There are 2 options here

        pass an extra request param like single=true (or something else)
        or
        just read the first key and check if they are one of "add", "commit", "delete", "optimize" . If not it is a single doc

        Show
        Noble Paul added a comment - There are 2 options here pass an extra request param like single=true (or something else) or just read the first key and check if they are one of "add", "commit", "delete", "optimize" . If not it is a single doc
        Hide
        Hoss Man added a comment -

        pass an extra request param like single=true (or something else)

        +0

        just read the first key and check if they are one of "add", "commit", "delete", "optimize" . If not it is a single doc

        -1 ... that's a time bomb waiting to go off for any user who might someday have a doc with "add" as a fieldname, or for any use who might have any field name in their doc that we might later introduce as a new command name.

        Show
        Hoss Man added a comment - pass an extra request param like single=true (or something else) +0 just read the first key and check if they are one of "add", "commit", "delete", "optimize" . If not it is a single doc -1 ... that's a time bomb waiting to go off for any user who might someday have a doc with "add" as a fieldname, or for any use who might have any field name in their doc that we might later introduce as a new command name.
        Hide
        Yonik Seeley added a comment -

        For more history see SOLR-2496

        pass an extra request param like single=true (or something else)

        Is that really easier than putting the doc in an array?

        Show
        Yonik Seeley added a comment - For more history see SOLR-2496 pass an extra request param like single=true (or something else) Is that really easier than putting the doc in an array?
        Hide
        Hoss Man added a comment -

        Is that really easier than putting the doc in an array?

        I guess it would at least be something you could configure as a handler default, for when you want to setup a handler that can take either a single document or a list of documents, but not arbitrary commands.

        Show
        Hoss Man added a comment - Is that really easier than putting the doc in an array? I guess it would at least be something you could configure as a handler default, for when you want to setup a handler that can take either a single document or a list of documents, but not arbitrary commands.
        Hide
        Noble Paul added a comment -

        It's not a big deal if you control how the json is created. Imagine someone piping the output of another program to solr .in that case user will have to do some coding to put those square brackets at the beginning and end

        Show
        Noble Paul added a comment - It's not a big deal if you control how the json is created. Imagine someone piping the output of another program to solr .in that case user will have to do some coding to put those square brackets at the beginning and end
        Hide
        Timothy Potter added a comment -

        Maybe we just leave it as-is (require the wrapping [ ]) and improve the error message as right now, this is what you get back from Solr when you add a single JSON document:

        curl http://localhost:8983/solr/gettingstarted/update?commit=true -X POST -H 'Content-Type: application/json' \
        -d '

        { "id":"foo", "POST_ID":"354-20160", ... }

        '

        {"responseHeader":

        {"status":400,"QTime":0}

        ,"error":{"msg":"Unknown command: id [6]","code":400}}

        So at the very least, we need to give the user an idea of how to move forward w/o having to post to the mailing list or search the docs.

        Show
        Timothy Potter added a comment - Maybe we just leave it as-is (require the wrapping [ ]) and improve the error message as right now, this is what you get back from Solr when you add a single JSON document: curl http://localhost:8983/solr/gettingstarted/update?commit=true -X POST -H 'Content-Type: application/json' \ -d ' { "id":"foo", "POST_ID":"354-20160", ... } ' {"responseHeader": {"status":400,"QTime":0} ,"error":{"msg":"Unknown command: id [6] ","code":400}} So at the very least, we need to give the user an idea of how to move forward w/o having to post to the mailing list or search the docs.
        Hide
        Grant Ingersoll added a comment -

        Is that really easier than putting the doc in an array?

        If you're data already is written out in JSON and you just want to get started by throwing it into Solr in your first few minutes of trying it out, it could be. Just one more thing to have to deal with, I guess, but obviously not a show stopper by any stretch.

        Show
        Grant Ingersoll added a comment - Is that really easier than putting the doc in an array? If you're data already is written out in JSON and you just want to get started by throwing it into Solr in your first few minutes of trying it out, it could be. Just one more thing to have to deal with, I guess, but obviously not a show stopper by any stretch.
        Hide
        Steve Rowe added a comment - - edited

        "add", "commit" and "optimize" expect as their value an array of objects or an object.

        "delete" can take a scalar id, or an array of id scalars, or an object with a query or "id"+id scalar and optional params (whew).

        "rollback" takes no value (I think this may be a parsing bug - does JSON accept a value-less key?)

        By contrast, a document (with the exception of nested docs) will always have flat "key":scalar or "key":array-of-scalar structure.

        So with the exception of "delete", I think it should possible to detect commands versus fields based exclusively on value types. Maybe just have a documented exception that "delete" field names are disallowed in single document mode?

        Show
        Steve Rowe added a comment - - edited "add", "commit" and "optimize" expect as their value an array of objects or an object. "delete" can take a scalar id, or an array of id scalars, or an object with a query or "id"+id scalar and optional params (whew). "rollback" takes no value (I think this may be a parsing bug - does JSON accept a value-less key?) By contrast, a document (with the exception of nested docs) will always have flat "key":scalar or "key":array-of-scalar structure. So with the exception of "delete", I think it should possible to detect commands versus fields based exclusively on value types. Maybe just have a documented exception that "delete" field names are disallowed in single document mode?
        Hide
        Yonik Seeley added a comment -

        "add", "commit" and "optimize" expect as their value an array of objects or an object. [...] By contrast, a document (with the exception of nested docs) will always have flat "key":scalar or "key":array-of-scalar structure.

        Not a doc with partial/atomic updates.

        "rollback" takes no value (I think this may be a parsing bug - does JSON accept a value-less key?)

        No, it's not valid JSON. Are you sure it's currently accepted?

        Show
        Yonik Seeley added a comment - "add", "commit" and "optimize" expect as their value an array of objects or an object. [...] By contrast, a document (with the exception of nested docs) will always have flat "key":scalar or "key":array-of-scalar structure. Not a doc with partial/atomic updates. "rollback" takes no value (I think this may be a parsing bug - does JSON accept a value-less key?) No, it's not valid JSON. Are you sure it's currently accepted?
        Hide
        Hoss Man added a comment -

        By contrast, a document (with the exception of nested docs) will always have flat ...

        ...the exception that complicates the rule.

        Hueristics would just lead us down a rabbit hole of complexity with no good way out.

        the idea of a "json.command" (or something like that) request param seems like it solves all of the suggested usecases.

        we can always add an "/update/json/single" or something to the sample configs if people find even that burdensome ... we can also change the default value of the new request param (and thus, the default behavior) in 5.0 so that by default we assume everything is a doc, and you have to send "json.command=true" if you want your top level braces to be interpreted as starting a set of commands.

        Show
        Hoss Man added a comment - By contrast, a document (with the exception of nested docs) will always have flat ... ...the exception that complicates the rule. Hueristics would just lead us down a rabbit hole of complexity with no good way out. the idea of a "json.command" (or something like that) request param seems like it solves all of the suggested usecases. we can always add an "/update/json/single" or something to the sample configs if people find even that burdensome ... we can also change the default value of the new request param (and thus, the default behavior) in 5.0 so that by default we assume everything is a doc, and you have to send "json.command=true" if you want your top level braces to be interpreted as starting a set of commands.
        Hide
        Noble Paul added a comment -

        we can always add an "/update/json/single"

        or /update/json/doc Yes, I was about to suggest this .This is a better idea. And I should be able to keep pumping docs after docs in a single stream

        The point is, I rarely find json embedded in square brackets in the wild . And it is burdensome for users to edit large json to add square brackets in the beginning and end

        Show
        Noble Paul added a comment - we can always add an "/update/json/single" or /update/json/doc Yes, I was about to suggest this .This is a better idea. And I should be able to keep pumping docs after docs in a single stream The point is, I rarely find json embedded in square brackets in the wild . And it is burdensome for users to edit large json to add square brackets in the beginning and end
        Hide
        Steve Rowe added a comment - - edited

        "rollback" takes no value (I think this may be a parsing bug - does JSON accept a value-less key?)

        No, it's not valid JSON. Are you sure it's currently accepted?

        I'm wrong, I misread the source - actually "rollback" requires its value to be an empty object (JsonLoader.parseRollback()).

        Show
        Steve Rowe added a comment - - edited "rollback" takes no value (I think this may be a parsing bug - does JSON accept a value-less key?) No, it's not valid JSON. Are you sure it's currently accepted? I'm wrong, I misread the source - actually "rollback" requires its value to be an empty object ( JsonLoader.parseRollback() ).
        Hide
        Yonik Seeley added a comment -

        The point is, I rarely find json embedded in square brackets in the wild

        I don't think anyone is disagreeing about that (in the absence of other concerns). The question is what to do about it.

        Maybe we should think about adding a new REST endpoint via restlet?

        Show
        Yonik Seeley added a comment - The point is, I rarely find json embedded in square brackets in the wild I don't think anyone is disagreeing about that (in the absence of other concerns). The question is what to do about it. Maybe we should think about adding a new REST endpoint via restlet?
        Hide
        Noble Paul added a comment -

        Another suggestion is to use http PUT for adding docs and POST for commands

        Show
        Noble Paul added a comment - Another suggestion is to use http PUT for adding docs and POST for commands
        Hide
        Steve Rowe added a comment -

        or /update/json/doc

        and/or just /update/doc, with content-type routing like current /update

        Show
        Steve Rowe added a comment - or /update/json/doc and/or just /update/doc , with content-type routing like current /update
        Hide
        Noble Paul added a comment -

        and/or just /update/doc, with content-type routing like current /update

        I feel having that extra type info in the uri helps rather than as an http header i. Isn't it nice to have /update/json than /update -H 'Content-type:application/json'

        Show
        Noble Paul added a comment - and/or just /update/doc, with content-type routing like current /update I feel having that extra type info in the uri helps rather than as an http header i. Isn't it nice to have /update/json than /update -H 'Content-type:application/json'
        Hide
        Steve Rowe added a comment -

        and/or just /update/doc, with content-type routing like current /update

        I feel having that extra type info in the uri helps rather than as an http header i. Isn't it nice to have /update/json than /update -H 'Content-type:application/json'

        I agree, /update/json/doc is quicker to type in on the command line.

        Show
        Steve Rowe added a comment - and/or just /update/doc, with content-type routing like current /update I feel having that extra type info in the uri helps rather than as an http header i. Isn't it nice to have /update/json than /update -H 'Content-type:application/json' I agree, /update/json/doc is quicker to type in on the command line.
        Hide
        Yonik Seeley added a comment -

        Isn't it nice to have /update/json than /update -H 'Content-type:application/json'

        I agree. Of course making it simpler on the command line (i.e. avoid having to set the content type header) requires ignoring the content-type sent by curl. I'm for that... but not sure what others may think. Some people get aggravated when I've ignored standards in favor of usability in the past.

        Show
        Yonik Seeley added a comment - Isn't it nice to have /update/json than /update -H 'Content-type:application/json' I agree. Of course making it simpler on the command line (i.e. avoid having to set the content type header) requires ignoring the content-type sent by curl. I'm for that... but not sure what others may think. Some people get aggravated when I've ignored standards in favor of usability in the past.
        Hide
        Noble Paul added a comment -

        OK , So I'm going with the update/json/doc option where you can just put one or more docs

        Show
        Noble Paul added a comment - OK , So I'm going with the update/json/doc option where you can just put one or more docs
        Hide
        Varun Thacker added a comment -

        Currently this is the behaviour -

        Request:

        curl http://localhost:8983/solr/update/json?commit=true -d '
        [
         {"id" : "1", "title" : "test"}
        ]'
        

        Response:

        <?xml version="1.0" encoding="UTF-8"?>
        <response>
        <lst name="responseHeader"><int name="status">0</int><int name="QTime">1</int></lst>
        </response>
        

        No document gets added. Confused as to why the response gave a status=0 and did not add the document.

        But adding the content header type leads to document add -

        curl http://localhost:8983/solr/update/json?commit=true -H 'Content-type:application/json' -d '
        [
         {"id" : "1", "title" : "test"}
        ]'
        

        So the content type header gets respected currently.

        Show
        Varun Thacker added a comment - Currently this is the behaviour - Request: curl http://localhost:8983/solr/update/json?commit=true -d ' [ {"id" : "1", "title" : "test"} ]' Response: <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"><int name="status">0</int><int name="QTime">1</int></lst> </response> No document gets added. Confused as to why the response gave a status=0 and did not add the document. But adding the content header type leads to document add - curl http://localhost:8983/solr/update/json?commit=true -H 'Content-type:application/json' -d ' [ {"id" : "1", "title" : "test"} ]' So the content type header gets respected currently.
        Hide
        Yonik Seeley added a comment -

        No document gets added.

        Lovely. (sarcasm)
        It's treating the body as key/value query parameters (i.e. respecting the content-type that curl puts in there by default).
        You can see this in the log where the whole body is treated as a key without a value:

        19602 [qtp316753575-14] INFO  org.apache.solr.update.processor.LogUpdateProcessor  – [collection1] webapp=/solr path=/update/json params={commit=true&
        [
         {"id" : "1", "title" : "test"}
        ]=} {commit=} 0 6
        
        Show
        Yonik Seeley added a comment - No document gets added. Lovely. (sarcasm) It's treating the body as key/value query parameters (i.e. respecting the content-type that curl puts in there by default). You can see this in the log where the whole body is treated as a key without a value: 19602 [qtp316753575-14] INFO org.apache.solr.update.processor.LogUpdateProcessor – [collection1] webapp=/solr path=/update/json params={commit= true & [ { "id" : "1" , "title" : "test" } ]=} {commit=} 0 6
        Hide
        Noble Paul added a comment -

        Varun Thacker Please open annother bug and we should fix it

        Show
        Noble Paul added a comment - Varun Thacker Please open annother bug and we should fix it
        Hide
        Grant Ingersoll added a comment -

        I really don't like having separate APIs for one doc vs. multiple docs. How about deprecating the existing approach in favor a new one that properly captures commands, single docs, and multiple docs and is clean for users?

        The whole point of stuff like this is to make it easier for people first coming to Solr to not have to think about all the gotchas and just get their data in. Ease of use has to be all about the user's data and their questions for it, not about idiosyncrasies in API design to workaround previous approaches.

        Show
        Grant Ingersoll added a comment - I really don't like having separate APIs for one doc vs. multiple docs. How about deprecating the existing approach in favor a new one that properly captures commands, single docs, and multiple docs and is clean for users? The whole point of stuff like this is to make it easier for people first coming to Solr to not have to think about all the gotchas and just get their data in. Ease of use has to be all about the user's data and their questions for it, not about idiosyncrasies in API design to workaround previous approaches.
        Hide
        Hoss Man added a comment -

        I really don't like having separate APIs for one doc vs. multiple docs...

        I don't think anyone suggested that?

        what was suggested was a new request param (i suggested json.command=true|false), that would indicate when a top level JSON object should be interpreted as a set of commands, or as a single document.

        if this request param exists, then a new sample configuration could be supplied (noble suggested /update/json/doc - i would suggest /update/json/docs (plural) might be better) which could have a <lst name="defaults"/> block setting json.command=true ... which would mean you could send either one doc or multiple docs to that endpoint, and they would just plain work. if you use /update or /update/json then for backcompat reasons, a top level JSON object would be interpreted as a list of commands – but you could override that with json.command=true in the request.

        How about deprecating the existing approach in favor a new one that properly captures commands, single docs, and multiple docs and is clean for users?

        I'm not sure i understand what you are proposing? .. there are only 3 "top level" constructs we could have in a valid JSON document: arrays, objects, and literals – if you want both top level objects / top level arrays to be interpreted as a doc / list of docs that doesn't leave much to put commands (only the most trivial command could be specified by a top level string/number primitive.

        If folks feel like sending single documents should be the "default" and commands should be considered abnormal (a sentiment i generally agree with, particularly since most basic commands can already be sent as request params) then i already made a suggestion to how to move forward with that goal:

        we can also change the default value of the new request param (and thus, the default behavior) in 5.0 so that by default we assume everything is a doc, and you have to send "json.command=true" if you want your top level braces to be interpreted as starting a set of commands.


        Ease of use has to be all about the user's data and their questions for it, not about idiosyncrasies in API design to workaround previous approaches.

        Nothing in the suggestion Noble and i coalesced towards has anything to do with "working around" the existing API – it's about ensuring that as the API evolves to be more freindly towards new users, we don't alienate existing users by breaking their shit if we don't have to, and giving them an easy way to edit their config to make their shit keep working if we change the default behavior.

        Show
        Hoss Man added a comment - I really don't like having separate APIs for one doc vs. multiple docs... I don't think anyone suggested that? what was suggested was a new request param (i suggested json.command=true|false ), that would indicate when a top level JSON object should be interpreted as a set of commands, or as a single document. if this request param exists, then a new sample configuration could be supplied (noble suggested /update/json/doc - i would suggest /update/json/docs (plural) might be better) which could have a <lst name="defaults"/> block setting json.command=true ... which would mean you could send either one doc or multiple docs to that endpoint, and they would just plain work. if you use /update or /update/json then for backcompat reasons, a top level JSON object would be interpreted as a list of commands – but you could override that with json.command=true in the request. How about deprecating the existing approach in favor a new one that properly captures commands, single docs, and multiple docs and is clean for users? I'm not sure i understand what you are proposing? .. there are only 3 "top level" constructs we could have in a valid JSON document: arrays, objects, and literals – if you want both top level objects / top level arrays to be interpreted as a doc / list of docs that doesn't leave much to put commands (only the most trivial command could be specified by a top level string/number primitive. If folks feel like sending single documents should be the "default" and commands should be considered abnormal (a sentiment i generally agree with, particularly since most basic commands can already be sent as request params) then i already made a suggestion to how to move forward with that goal: we can also change the default value of the new request param (and thus, the default behavior) in 5.0 so that by default we assume everything is a doc, and you have to send "json.command=true" if you want your top level braces to be interpreted as starting a set of commands. Ease of use has to be all about the user's data and their questions for it, not about idiosyncrasies in API design to workaround previous approaches. Nothing in the suggestion Noble and i coalesced towards has anything to do with "working around" the existing API – it's about ensuring that as the API evolves to be more freindly towards new users, we don't alienate existing users by breaking their shit if we don't have to, and giving them an easy way to edit their config to make their shit keep working if we change the default behavior.
        Hide
        Noble Paul added a comment -

        Hoss more or less summed up what I was proposing

        key points

        • The current json object format should not be broken.
        • The /update/json/docs or /update/json/doc (Plural is better because it accepts many) is a shorthand for /update/json?json.command=false . Actually users are free to create clean http endpoints by adding a <requestHandler> entry in solrconfig.xml with the params added to defaults . In this case we are adding this ourselves. This is why I created SOLR-6302 also . We should not make our solrconfig.xml ugly by adding stuff which everyone needs and no one should modify (other than experts)

        we can also change the default value of the new request param (and thus, the default behavior) in 5.0

        I don't think we will ever be able to change that default without screwing a lot of users .

        Show
        Noble Paul added a comment - Hoss more or less summed up what I was proposing key points The current json object format should not be broken. The /update/json/docs or /update/json/doc (Plural is better because it accepts many) is a shorthand for /update/json?json.command=false . Actually users are free to create clean http endpoints by adding a <requestHandler> entry in solrconfig.xml with the params added to defaults . In this case we are adding this ourselves. This is why I created SOLR-6302 also . We should not make our solrconfig.xml ugly by adding stuff which everyone needs and no one should modify (other than experts) we can also change the default value of the new request param (and thus, the default behavior) in 5.0 I don't think we will ever be able to change that default without screwing a lot of users .
        Hide
        Noble Paul added a comment -

        Fix for both the tickets

        It is OK if we want to split them

        Show
        Noble Paul added a comment - Fix for both the tickets It is OK if we want to split them
        Hide
        Noble Paul added a comment -

        I shall go ahead and commit it soon, if there are no objections

        Show
        Noble Paul added a comment - I shall go ahead and commit it soon, if there are no objections
        Hide
        ASF subversion and git services added a comment -

        Commit 1615345 from Noble Paul in branch 'dev/trunk'
        [ https://svn.apache.org/r1615345 ]

        SOLR-6294 ,SOLR-6302

        Show
        ASF subversion and git services added a comment - Commit 1615345 from Noble Paul in branch 'dev/trunk' [ https://svn.apache.org/r1615345 ] SOLR-6294 , SOLR-6302
        Hide
        ASF subversion and git services added a comment -

        Commit 1615347 from Noble Paul in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1615347 ]

        SOLR-6294 ,SOLR-6302

        Show
        ASF subversion and git services added a comment - Commit 1615347 from Noble Paul in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1615347 ] SOLR-6294 , SOLR-6302

          People

          • Assignee:
            Noble Paul
            Reporter:
            Noble Paul
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development