Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.1, 4.0-ALPHA
    • Component/s: None
    • Labels:
      None

      Description

      In addition to supporting xml and csv updating, it would be good to support json.

      This patch uses noggit, a streaming json parser, to build the commands.

        Issue Links

          Activity

          Hide
          Ryan McKinley added a comment -

          Here is a patch that lets you update sending documents that look like this:

          { 
          
          "add": {
            "doc": {
              "f0": "v0",
              "f2": {
                "boost": 2.3,
                "value": "test"
              },
              "array": [ "aaa", "bbb" ],
              "boosted": {
                "boost": 6.7,
                "value": [ "aaa", "bbb" ]
              }
            }
          },
          "add": {
            "commitWithin": 1234,
            "overwrite": false,
            "boost": 3.45,
            "doc": {
              "f1": "v1",
              "f1": "v2"
            }
          },
          
          "commit": {},
          "optimize": { "waitFlush":false, "waitSearcher":false },
          
          "delete": { "id":"ID" },
          "delete": { "query":"QUERY" },
          "rollback": {}
          
          }
          
          
          Show
          Ryan McKinley added a comment - Here is a patch that lets you update sending documents that look like this: { "add" : { "doc" : { "f0" : "v0" , "f2" : { "boost" : 2.3, "value" : "test" }, "array" : [ "aaa" , "bbb" ], "boosted" : { "boost" : 6.7, "value" : [ "aaa" , "bbb" ] } } }, "add" : { "commitWithin" : 1234, "overwrite" : false , "boost" : 3.45, "doc" : { "f1" : "v1" , "f1" : "v2" } }, "commit" : {}, "optimize" : { "waitFlush" : false , "waitSearcher" : false }, "delete" : { "id" : "ID" }, "delete" : { "query" : "QUERY" }, "rollback" : {} }
          Hide
          Otis Gospodnetic added a comment -

          Out of curiosity, is there a benefit to using JSON over XML for indexing/updating? Perhaps noggit is (much?) faster than the XML parser Solr uses and this has noticeable difference? (though I'd guess that indexing itself is what takes most of the time)

          Show
          Otis Gospodnetic added a comment - Out of curiosity, is there a benefit to using JSON over XML for indexing/updating? Perhaps noggit is (much?) faster than the XML parser Solr uses and this has noticeable difference? (though I'd guess that indexing itself is what takes most of the time)
          Hide
          Ryan McKinley added a comment -

          I have not benchmarked anything yet... current motivation is interface rather then speed (though it is potentially faster)

          Show
          Ryan McKinley added a comment - I have not benchmarked anything yet... current motivation is interface rather then speed (though it is potentially faster)
          Hide
          Yonik Seeley added a comment -

          Yes, the parsing is much faster in noggit vs parsing XML (or vs other JSON parsers for that matter). Not sure what the split between parsing/indexing... I imagine/hope that more time is spent in indexing.

          Two more benefits:

          • smaller footprint... less network IO
          • able to represent the entire unicode range
          Show
          Yonik Seeley added a comment - Yes, the parsing is much faster in noggit vs parsing XML (or vs other JSON parsers for that matter). Not sure what the split between parsing/indexing... I imagine/hope that more time is spent in indexing. Two more benefits: smaller footprint... less network IO able to represent the entire unicode range
          Hide
          Yonik Seeley added a comment -

          The API looks like a direct translation of the XML API.... that's a reasonable approach, but we should also take this chance to revisit and see what we might want to change.

          If we were to do it over again (now that we can grab params from the URL in a POST), would we prefer removing the adjectives like "add" and some of the other parameters from the XML?

          http://localhost:8983/solr/update/add?commitWithin=1234
          
          {
            "docs":[
               { "f0": "v0",
                  "f2": {
                  "boost": 2.3,
                  "value": "test"}
               },
               { "fo":"zzz",
                 "f1":"ggg"
               }
            ]
          }
            
          
          for deletes,
          http://localhost:8983/solr/update/delete?q=foo:1234   (or /update?delete=foo:1234)
          
          Show
          Yonik Seeley added a comment - The API looks like a direct translation of the XML API.... that's a reasonable approach, but we should also take this chance to revisit and see what we might want to change. If we were to do it over again (now that we can grab params from the URL in a POST), would we prefer removing the adjectives like "add" and some of the other parameters from the XML? http: //localhost:8983/solr/update/add?commitWithin=1234 { "docs" :[ { "f0" : "v0" , "f2" : { "boost" : 2.3, "value" : "test" } }, { "fo" : "zzz" , "f1" : "ggg" } ] } for deletes, http: //localhost:8983/solr/update/delete?q=foo:1234 (or /update?delete=foo:1234)
          Hide
          Ryan McKinley added a comment -

          Is this something we should consider for 1.4? Since Grant refactored all the XmlLoader stuff, this is a pretty simple extension.

          The only real issue i see is how to include the noggit library? Since noggit is in apache labs, it can not have a release there. We could:
          1. build a jar and release it as "apache-solr-noggit.jar" the same way we do with commons-csv
          2. move noggit to sourceforge and release from there.

          #1 seems easier to me.

          Show
          Ryan McKinley added a comment - Is this something we should consider for 1.4? Since Grant refactored all the XmlLoader stuff, this is a pretty simple extension. The only real issue i see is how to include the noggit library? Since noggit is in apache labs, it can not have a release there. We could: 1. build a jar and release it as "apache-solr-noggit.jar" the same way we do with commons-csv 2. move noggit to sourceforge and release from there. #1 seems easier to me.
          Hide
          Ryan McKinley added a comment -

          Re: API revist...

          For better or worse, this patch matches the JSON format with the XXXUpdateCommand classes. Unlike XML, each add document requires a new add statement. I did this since adding the document boost gets really clumsy.

          "add": {
            "commitWithin": 1234,
            "overwrite": false,
            "boost": 3.45,
            "doc": {
              "f1": "v1",
              "f1": "v2"
            }
          

          Adding the document boost to your example gets a bit ugly:

          "docs":[
               { "boost": 2,
                  "fields": { "f0": "v0",  "f1": 2.4 }
               },
               { "boost": 3,
                  "fields": { "f0": "v0",  "f1": 2.4 }
               },
            ]
          

          Personally, I like having the entire command encompassed in JSON rather then spreading it between the query args and the post body. I like this since all commands can be represented sequentially and clearly. Also it allows for easier streamming.

          For the 'add' command, I don't think we make things much easier/clearer by adding args.

          I agree a more RESTfull API is a good thing, but I think that is a separate task. For that, we should look at supporting HTTP GET/PUT/DELETE as the main control structures rather then passing params.

          For the XmlUpdateRequestHandler we added some arguments to the query string so that we could call "commit" in the same request that we send documents. In retrospect I'm not sure that was a good idea. We could achieve the same thing with:

          <commands>
            <add>
               ...
            </add>
            <commit />
          </commands>
          

          (this is currently supported by the XmlUpdateRequestHandler, since it only starts parsing after it hits known commands (add, commit, etc)

          Show
          Ryan McKinley added a comment - Re: API revist... For better or worse, this patch matches the JSON format with the XXXUpdateCommand classes. Unlike XML, each add document requires a new add statement. I did this since adding the document boost gets really clumsy. "add" : { "commitWithin" : 1234, "overwrite" : false , "boost" : 3.45, "doc" : { "f1" : "v1" , "f1" : "v2" } Adding the document boost to your example gets a bit ugly: "docs" :[ { "boost" : 2, "fields" : { "f0" : "v0" , "f1" : 2.4 } }, { "boost" : 3, "fields" : { "f0" : "v0" , "f1" : 2.4 } }, ] Personally, I like having the entire command encompassed in JSON rather then spreading it between the query args and the post body. I like this since all commands can be represented sequentially and clearly. Also it allows for easier streamming. For the 'add' command, I don't think we make things much easier/clearer by adding args. I agree a more RESTfull API is a good thing, but I think that is a separate task. For that, we should look at supporting HTTP GET/PUT/DELETE as the main control structures rather then passing params. For the XmlUpdateRequestHandler we added some arguments to the query string so that we could call "commit" in the same request that we send documents. In retrospect I'm not sure that was a good idea. We could achieve the same thing with: <commands> <add> ... </add> <commit /> </commands> (this is currently supported by the XmlUpdateRequestHandler, since it only starts parsing after it hits known commands (add, commit, etc)
          Hide
          Yonik Seeley added a comment -

          Is this something we should consider for 1.4?

          I think so... it's easy to understand both the API and the impact at a glance.

          1. build a jar and release it as "apache-solr-noggit.jar" the same way we do with commons-csv

          +1

          Show
          Yonik Seeley added a comment - Is this something we should consider for 1.4? I think so... it's easy to understand both the API and the impact at a glance. 1. build a jar and release it as "apache-solr-noggit.jar" the same way we do with commons-csv +1
          Hide
          Matt Weber added a comment -

          Any update on this for 1.4?

          +1 here.

          Show
          Matt Weber added a comment - Any update on this for 1.4? +1 here.
          Hide
          Yonik Seeley added a comment -

          Any objections to committing this to trunk soon?

          Show
          Yonik Seeley added a comment - Any objections to committing this to trunk soon?
          Hide
          Yonik Seeley added a comment -

          Committed, after changing the test to use a string instead of a file loaded from the classloader (which failed from my IDE for some reason).

          Show
          Yonik Seeley added a comment - Committed, after changing the test to use a string instead of a file loaded from the classloader (which failed from my IDE for some reason).
          Hide
          Paul Dlug added a comment -

          Is there any chance this could be backported to branch_3x? (Perhaps even 1.4.x in time for 1.4.1?)

          Show
          Paul Dlug added a comment - Is there any chance this could be backported to branch_3x? (Perhaps even 1.4.x in time for 1.4.1?)
          Hide
          Yonik Seeley added a comment -

          Yeah, this should be able to go back to the 3x branch (1.4.x is bugfix only though).

          Show
          Yonik Seeley added a comment - Yeah, this should be able to go back to the 3x branch (1.4.x is bugfix only though).
          Hide
          Yonik Seeley added a comment -

          OK, I just backported this to the 3x branch.

          Show
          Yonik Seeley added a comment - OK, I just backported this to the 3x branch.
          Hide
          Steve Rowe added a comment -

          reopen to add 3.1 fix version

          Show
          Steve Rowe added a comment - reopen to add 3.1 fix version

            People

            • Assignee:
              Yonik Seeley
              Reporter:
              Ryan McKinley
            • Votes:
              6 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development