Solr
  1. Solr
  2. SOLR-2496

JSON Update Handler doesn't handle multiple docs properly

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 3.1
    • Fix Version/s: 3.2
    • Component/s: update
    • Labels:

      Description

      The following is the current Solr 3.1 format for sending multiple documents by JSON. It's not analogous to the XML method, and isn't easily generated and serialized from a hash in Perl, Python, Ruby, et al to JSON, because it has duplicate keys for "add".

      It's cited at this page: http://wiki.apache.org/solr/UpdateJSON
      Near the text: "Here's a simple example of adding more than one document at once:"

      {
          "add": {"doc": {"id" : "TestDoc1", "title" : "test1"} },
          "add": {"doc": {"id" : "TestDoc2", "title" : "another test"} }
      }'
      

      Here's a better format that's analogous to the XML method of submission, and is easily serialized from a hash to JSON:

      {
          "add": {
              "doc": [
                  {"id" : "TestDoc1", "title" : "test1"},
                  {"id" : "TestDoc2", "title" : "another test"},
              ],
          },
      }
      

      The original XML method:

      <add>
          <doc>
             <field name="id">TestDoc1<field><field name="title">"test1"</field>
          </doc>
          <doc>
             <field name="id">TestDoc2<field><field name="title">"test2"</field></field>
          </doc>
      </add>
      
      1. SOLR-2496.patch
        22 kB
        Yonik Seeley

        Activity

        Neil Hooey created issue -
        Neil Hooey made changes -
        Field Original Value New Value
        Issue Type Bug [ 1 ] Improvement [ 4 ]
        Description The following is the current Solr 3.1 format for sending multiple
        documents by JSON. It's not analogous to the XML method, and
        isn't easily generated and serialized from a hash in Perl,
        Python, Ruby, et al to JSON, because it has duplicate keys for "add".

        It's cited at this page: http://wiki.apache.org/solr/UpdateJSON
        Near the text: "Here's a simple example of adding more than one document at once:"
        {
            "add": {"doc": {"id" : "TestDoc1", "title" : "test1"} },
            "add": {"doc": {"id" : "TestDoc2", "title" : "another test"} }
        }'

        Here's a better format that's analogous to the XML method of submission, and is easily serialized from a hash to JSON:
        {
            "add": {
                "doc": [
                    {"id" : "TestDoc1", "title" : "test1"},
                    {"id" : "TestDoc2", "title" : "another test"},
                ],
            },
        }

        The original XML method:
        <add>
            <doc>
               <field name="id">TestDoc1<field><field name="title">"test1"</field>
            </doc>
            <doc>
               <field name="id">TestDoc2<field><field name="title">"test2"</field></field>
            </doc>
        </add>
        The following is the current Solr 3.1 format for sending multiple
        documents by JSON. It's not analogous to the XML method, and
        isn't easily generated and serialized from a hash in Perl,
        Python, Ruby, et al to JSON, because it has duplicate keys for "add".

        It's cited at this page: http://wiki.apache.org/solr/UpdateJSON
        Near the text: "Here's a simple example of adding more than one document at once:"
        {code}
        {
            "add": {"doc": {"id" : "TestDoc1", "title" : "test1"} },
            "add": {"doc": {"id" : "TestDoc2", "title" : "another test"} }
        }'
        {code}

        Here's a better format that's analogous to the XML method of submission, and is easily serialized from a hash to JSON:
        {code}
        {
            "add": {
                "doc": [
                    {"id" : "TestDoc1", "title" : "test1"},
                    {"id" : "TestDoc2", "title" : "another test"},
                ],
            },
        }
        {code}

        The original XML method:
        {code}
        <add>
            <doc>
               <field name="id">TestDoc1<field><field name="title">"test1"</field>
            </doc>
            <doc>
               <field name="id">TestDoc2<field><field name="title">"test2"</field></field>
            </doc>
        </add>
        {code}
        Neil Hooey made changes -
        Original Estimate 4h [ 14400 ]
        Remaining Estimate 4h [ 14400 ]
        Neil Hooey made changes -
        Description The following is the current Solr 3.1 format for sending multiple
        documents by JSON. It's not analogous to the XML method, and
        isn't easily generated and serialized from a hash in Perl,
        Python, Ruby, et al to JSON, because it has duplicate keys for "add".

        It's cited at this page: http://wiki.apache.org/solr/UpdateJSON
        Near the text: "Here's a simple example of adding more than one document at once:"
        {code}
        {
            "add": {"doc": {"id" : "TestDoc1", "title" : "test1"} },
            "add": {"doc": {"id" : "TestDoc2", "title" : "another test"} }
        }'
        {code}

        Here's a better format that's analogous to the XML method of submission, and is easily serialized from a hash to JSON:
        {code}
        {
            "add": {
                "doc": [
                    {"id" : "TestDoc1", "title" : "test1"},
                    {"id" : "TestDoc2", "title" : "another test"},
                ],
            },
        }
        {code}

        The original XML method:
        {code}
        <add>
            <doc>
               <field name="id">TestDoc1<field><field name="title">"test1"</field>
            </doc>
            <doc>
               <field name="id">TestDoc2<field><field name="title">"test2"</field></field>
            </doc>
        </add>
        {code}
        The following is the current Solr 3.1 format for sending multiple documents by JSON. It's not analogous to the XML method, and isn't easily generated and serialized from a hash in Perl, Python, Ruby, et al to JSON, because it has duplicate keys for "add".

        It's cited at this page: http://wiki.apache.org/solr/UpdateJSON
        Near the text: "Here's a simple example of adding more than one document at once:"
        {code}
        {
            "add": {"doc": {"id" : "TestDoc1", "title" : "test1"} },
            "add": {"doc": {"id" : "TestDoc2", "title" : "another test"} }
        }'
        {code}

        Here's a better format that's analogous to the XML method of submission, and is easily serialized from a hash to JSON:
        {code}
        {
            "add": {
                "doc": [
                    {"id" : "TestDoc1", "title" : "test1"},
                    {"id" : "TestDoc2", "title" : "another test"},
                ],
            },
        }
        {code}

        The original XML method:
        {code}
        <add>
            <doc>
               <field name="id">TestDoc1<field><field name="title">"test1"</field>
            </doc>
            <doc>
               <field name="id">TestDoc2<field><field name="title">"test2"</field></field>
            </doc>
        </add>
        {code}
        Hide
        Yonik Seeley added a comment -

        Yeah, I agree we should be able to add multiple docs w/o having to repeat tags in the same hash/object.
        I proposed something like what you have, and the original thinking of the current
        format is in this issue: SOLR-945

        Show
        Yonik Seeley added a comment - Yeah, I agree we should be able to add multiple docs w/o having to repeat tags in the same hash/object. I proposed something like what you have, and the original thinking of the current format is in this issue: SOLR-945
        Hide
        Yonik Seeley added a comment -

        Here's a patch that extends the current syntax with a simplified syntax that allows an array of documents at the top level or inside an "add" command.
        It also adds the ability to specify "commitWithin" and "overwrite" on the URL (same as the CSVLoader).

        Examples of new simplified syntax:
        [

        {"id":"1"}

        ,

        {"id":"2"}

        ]

        {"add":[

        {"id":"1"}

        ,

        {"id":"2"}

        ]}

        Show
        Yonik Seeley added a comment - Here's a patch that extends the current syntax with a simplified syntax that allows an array of documents at the top level or inside an "add" command. It also adds the ability to specify "commitWithin" and "overwrite" on the URL (same as the CSVLoader). Examples of new simplified syntax: [ {"id":"1"} , {"id":"2"} ] {"add":[ {"id":"1"} , {"id":"2"} ]}
        Yonik Seeley made changes -
        Attachment SOLR-2496.patch [ 12478732 ]
        yonik committed 1124356 (2 files)
        Yonik Seeley made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Fix Version/s 3.2 [ 12316172 ]
        Resolution Fixed [ 1 ]
        Hide
        Neil Hooey added a comment -

        Awesome, thanks Yonik!

        Show
        Neil Hooey added a comment - Awesome, thanks Yonik!
        Hide
        Robert Muir added a comment -

        Bulk close for 3.2

        Show
        Robert Muir added a comment - Bulk close for 3.2
        Robert Muir made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Unassigned
            Reporter:
            Neil Hooey
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development