Solr
  1. Solr
  2. SOLR-1162

SolrJ UpdateRequest does not maintain order of operations when sending mixed types of changes (updates, delete id, delete query, update iterator)

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 1.3
    • Fix Version/s: None
    • Component/s: clients - java
    • Labels:
      None

      Description

      In SolrJ UpdateRequest object it maintains separate lists of documents to add, delete, and delete queries so that the order of those operations is not known to the caller. It really should execute the items in the same order they were added to the UpdateRequest.

      1. Solr-1162.patch
        41 kB
        Jayson Minard
      2. Solr-1162.patch
        41 kB
        Jayson Minard
      3. Solr-1162.patch
        37 kB
        Jayson Minard

        Issue Links

          Activity

          Hide
          Hoss Man added a comment -

          There is no indication that anyone is actively working on this issue, so removing 4.0 from the fixVersion.

          Show
          Hoss Man added a comment - There is no indication that anyone is actively working on this issue, so removing 4.0 from the fixVersion.
          Hide
          Robert Muir added a comment -

          rmuir20120906-bulk-40-change

          Show
          Robert Muir added a comment - rmuir20120906-bulk-40-change
          Hide
          Hoss Man added a comment -

          bulk fixing the version info for 4.0-ALPHA and 4.0 all affected issues have "hoss20120711-bulk-40-change" in comment

          Show
          Hoss Man added a comment - bulk fixing the version info for 4.0-ALPHA and 4.0 all affected issues have "hoss20120711-bulk-40-change" in comment
          Hide
          Robert Muir added a comment -

          3.4 -> 3.5

          Show
          Robert Muir added a comment - 3.4 -> 3.5
          Hide
          Robert Muir added a comment -

          Bulk move 3.2 -> 3.3

          Show
          Robert Muir added a comment - Bulk move 3.2 -> 3.3
          Hide
          Hoss Man added a comment -

          Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email...

          http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E

          Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed.

          A unique token for finding these 240 issues in the future: hossversioncleanup20100527

          Show
          Hoss Man added a comment - Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email... http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed. A unique token for finding these 240 issues in the future: hossversioncleanup20100527
          Hide
          Jayson Minard added a comment -

          Yikes, set to 1.5 – I better revamp this to current code.

          Show
          Jayson Minard added a comment - Yikes, set to 1.5 – I better revamp this to current code.
          Hide
          Jayson Minard added a comment -

          Sounds good. I'll watch SOLR-1164 for updates.

          Show
          Jayson Minard added a comment - Sounds good. I'll watch SOLR-1164 for updates.
          Hide
          Noble Paul added a comment -

          I shall update SOLR-1164 and you can take it over from there

          Show
          Noble Paul added a comment - I shall update SOLR-1164 and you can take it over from there
          Hide
          Jayson Minard added a comment -

          Updated patch to remove return value from the unmarshall call as suggested above.

          Show
          Jayson Minard added a comment - Updated patch to remove return value from the unmarshall call as suggested above.
          Hide
          Jayson Minard added a comment -

          Sounds good, didn't like that returned object in the old method anyway...

          I'll update the unmarshal and unit test that used that object to ensure it was accurate as well.

          What do you want to do about the deserialization? I think it was unsafe before in that it looked up the lists by names and made assumptions about the parms being first.

          Show
          Jayson Minard added a comment - Sounds good, didn't like that returned object in the old method anyway... I'll update the unmarshal and unit test that used that object to ensure it was accurate as well. What do you want to do about the deserialization? I think it was unsafe before in that it looked up the lists by names and made assumptions about the parms being first.
          Hide
          Noble Paul added a comment -

          we need a revamp of the JavaBinUpdateRequestCodec .

          The way the deserialization is done is not safe .

          
                    if (name.equals("params")) {
                      NamedList parmsAsList = (NamedList)val;
                      ModifiableSolrParams parms = namedListToSolrParams(parmsAsList);
                      updateRequest.setParams(parms);
                    }
                    else if (name.equals("delById")) {
                      List<String> delIds = (List<String>)val;
                      streamer.deleteById(delIds);
                    }
                    else if (name.equals("delByQ")) {
                      String delQuery = (String)val;
                      streamer.deleteByQuery(delQuery);
                    }
          

          may break if there are other objects with these names . I can provide a true streaming unmarshaller for JavabinCodec

          Let us change the method

          public UpdateRequest unmarshal(InputStream is, final StreamingDocumentHandler handler)
          

          to

          public void unmarshal(InputStream is, final StreamingDocumentHandler handler)
          

          and make the handler mandatory. so it does not have to worry about the type of Object

          Show
          Noble Paul added a comment - we need a revamp of the JavaBinUpdateRequestCodec . The way the deserialization is done is not safe . if (name.equals( "params" )) { NamedList parmsAsList = (NamedList)val; ModifiableSolrParams parms = namedListToSolrParams(parmsAsList); updateRequest.setParams(parms); } else if (name.equals( "delById" )) { List< String > delIds = (List< String >)val; streamer.deleteById(delIds); } else if (name.equals( "delByQ" )) { String delQuery = ( String )val; streamer.deleteByQuery(delQuery); } may break if there are other objects with these names . I can provide a true streaming unmarshaller for JavabinCodec Let us change the method public UpdateRequest unmarshal(InputStream is, final StreamingDocumentHandler handler) to public void unmarshal(InputStream is, final StreamingDocumentHandler handler) and make the handler mandatory. so it does not have to worry about the type of Object
          Hide
          Jayson Minard added a comment -

          UpdateRequest is used on both ends of the stream potentially so we would want to break out the server pieces to not use the same un-ordered object for binary codec. I could go either way, just never saw a need for unordered to remain but then again there is a whole community out there that might!

          Show
          Jayson Minard added a comment - UpdateRequest is used on both ends of the stream potentially so we would want to break out the server pieces to not use the same un-ordered object for binary codec. I could go either way, just never saw a need for unordered to remain but then again there is a whole community out there that might!
          Hide
          Noble Paul added a comment -

          Jayson, I see your point, Instead of changing the current UpdateRequest , isn't it better to add a new one OrderedUpdateRequest for this purpose?

          Show
          Noble Paul added a comment - Jayson, I see your point, Instead of changing the current UpdateRequest , isn't it better to add a new one OrderedUpdateRequest for this purpose?
          Hide
          Yonik Seeley added a comment -

          I haven't looked at the patch, but I agree with the rational. One would not be able to use multiple requests in parallel because of uncertain ordering of their execution.

          Show
          Yonik Seeley added a comment - I haven't looked at the patch, but I agree with the rational. One would not be able to use multiple requests in parallel because of uncertain ordering of their execution.
          Hide
          Jayson Minard added a comment -

          Outside of UpdateRequest if you hand construct a binary codec stream to send to Solr (without this patch) your order of actions would not be maintained within the stream. So the binary streaming update handler is "broken" in this regard as well.

          So this patch actually resolves two issue: UpdateRequest does not serialize into xml or binary stream the actions in-order. Nor does BinaryUpdateRequestHandler nor did the underlaying JavaBinUpdateRequestCodec so these all become unusable for mixed style actions since none can maintain order. UpdateRequest is just one of the clients to that server-side code that also has the same issue.

          Show
          Jayson Minard added a comment - Outside of UpdateRequest if you hand construct a binary codec stream to send to Solr (without this patch) your order of actions would not be maintained within the stream. So the binary streaming update handler is "broken" in this regard as well. So this patch actually resolves two issue: UpdateRequest does not serialize into xml or binary stream the actions in-order. Nor does BinaryUpdateRequestHandler nor did the underlaying JavaBinUpdateRequestCodec so these all become unusable for mixed style actions since none can maintain order. UpdateRequest is just one of the clients to that server-side code that also has the same issue.
          Hide
          Jayson Minard added a comment -

          Multiple requests are less efficient than sending large batches together.

          To be the most efficient with large requests, every user of SolrJ UpdateRequest would need to write the same logic... Place adds into UpdateRequest until you hit the first non-add, then send the UpdateRequest and start writing your deletes until you hit a non-delete, then flush the UpdateRequest and keep adding your new transaction type until you hit the first ... In that case they should avoid using UpdateRequest altogether as calling the SolrServer directly is just as "easy." If we are going to batch on their behalf why wouldn't we do it correctly and be predictable with our ordering. I'm sure if JDBC batches did not maintain order, there would be havoc to pay...

          Besides that, it isn't clear to users of UpdateRequest as to the order of operations, so someone doing an Add doc 1, Delete doc 1, Add doc 1 may not end up with the expected outcome. It turns into Add doc 1, Add doc 1, Delete doc1 when streaming and similary for XML version of the transaction. If I did a Delete Query : then Add doc1, Add doc 2 I end up with no docs as the delete query comes last, but I (the user) does not know that.

          I've written code to work around UpdateRequest ordering and I usually end up only using it for commitWithin or having a commit tacked on the end of the request due to the above issues.

          Show
          Jayson Minard added a comment - Multiple requests are less efficient than sending large batches together. To be the most efficient with large requests, every user of SolrJ UpdateRequest would need to write the same logic... Place adds into UpdateRequest until you hit the first non-add, then send the UpdateRequest and start writing your deletes until you hit a non-delete, then flush the UpdateRequest and keep adding your new transaction type until you hit the first ... In that case they should avoid using UpdateRequest altogether as calling the SolrServer directly is just as "easy." If we are going to batch on their behalf why wouldn't we do it correctly and be predictable with our ordering. I'm sure if JDBC batches did not maintain order, there would be havoc to pay... Besides that, it isn't clear to users of UpdateRequest as to the order of operations, so someone doing an Add doc 1, Delete doc 1, Add doc 1 may not end up with the expected outcome. It turns into Add doc 1, Add doc 1, Delete doc1 when streaming and similary for XML version of the transaction. If I did a Delete Query : then Add doc1, Add doc 2 I end up with no docs as the delete query comes last, but I (the user) does not know that. I've written code to work around UpdateRequest ordering and I usually end up only using it for commitWithin or having a commit tacked on the end of the request due to the above issues.
          Hide
          Noble Paul added a comment -

          I took a look at the patch . Looks fine , there are a lot of changes.

          But I am still not convinced of the usecase , considering that you can achieve the same with multiple requests easily. Could you help with a usecase

          Show
          Noble Paul added a comment - I took a look at the patch . Looks fine , there are a lot of changes. But I am still not convinced of the usecase , considering that you can achieve the same with multiple requests easily. Could you help with a usecase
          Hide
          Jayson Minard added a comment -

          Updated patch to fix commitWithin parameter in UpdateRequest and failing unit test.

          Show
          Jayson Minard added a comment - Updated patch to fix commitWithin parameter in UpdateRequest and failing unit test.
          Hide
          Jayson Minard added a comment -

          Also (for Noble Paul)...

          UpdateRequest is the invoker of itself via the process method, there is nothing to which you can pass it as a list.

          UpdateRequest req = new UpdateRequest();
          ... add things
          req.process(solrServer);

          Show
          Jayson Minard added a comment - Also (for Noble Paul)... UpdateRequest is the invoker of itself via the process method, there is nothing to which you can pass it as a list. UpdateRequest req = new UpdateRequest(); ... add things req.process(solrServer);
          Hide
          Jayson Minard added a comment -

          Current patch (12-May-2009 11:30PM) is not tested with live server, only against the unit test TestUpdateRequestCodec.

          Show
          Jayson Minard added a comment - Current patch (12-May-2009 11:30PM) is not tested with live server, only against the unit test TestUpdateRequestCodec.
          Hide
          Jayson Minard added a comment -

          In response to Noble Paul:

          An UpdateRequest currently contains: a list of adds, a single add via iterator, a list of delete by ids, a list of delete queries. So you can pile them all into the same object currently and then have no idea what order they will actually execute. If you want to stream adds, you cannot intermix deletes or other actions in the same stream. So if UpdateRequest is going to allow a set of different actions it should at least maintain the order in which they were added and execute them similarly.

          UpdateRequest is the current batching model, so it should be correct.

          Show
          Jayson Minard added a comment - In response to Noble Paul: An UpdateRequest currently contains: a list of adds, a single add via iterator, a list of delete by ids, a list of delete queries. So you can pile them all into the same object currently and then have no idea what order they will actually execute. If you want to stream adds, you cannot intermix deletes or other actions in the same stream. So if UpdateRequest is going to allow a set of different actions it should at least maintain the order in which they were added and execute them similarly. UpdateRequest is the current batching model, so it should be correct.
          Hide
          Jayson Minard added a comment -

          First round of patch, changed the UpdateRequest to maintain one list of different types of request items, then changed XML serialization to keep the same order, and the same for binary codec. Then updated the streaming side of the binary update handler to stream all request types (not just doc adds) and updated the unit test to verify order is maintained both when streaming or not streaming using the binary codec.

          Show
          Jayson Minard added a comment - First round of patch, changed the UpdateRequest to maintain one list of different types of request items, then changed XML serialization to keep the same order, and the same for binary codec. Then updated the streaming side of the binary update handler to stream all request types (not just doc adds) and updated the unit test to verify order is maintained both when streaming or not streaming using the binary codec.
          Hide
          Noble Paul added a comment -

          isn't it better to use multiple UpdateRequest to go in one Request and you can have a method which takes in an array of UpdateRequests.

          Show
          Noble Paul added a comment - isn't it better to use multiple UpdateRequest to go in one Request and you can have a method which takes in an array of UpdateRequests.
          Hide
          Jayson Minard added a comment -

          Working on this patch now...

          Show
          Jayson Minard added a comment - Working on this patch now...

            People

            • Assignee:
              Noble Paul
              Reporter:
              Jayson Minard
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:

                Development