Details

      Description

      In order increase the ability of Solr to be used as a NoSql database (lots of concurrent inserts, updates, deletes and queries in the entire lifetime of the index) instead of just a search index (first: everything indexed (in one thread), after: only queries), I would like Solr to support versioning to be used for optimistic locking.
      When my intent (see SOLR-3173) is to update an existing document, I will need to provide a version-number equal to "the version number I got when I fetched the existing document for update" plus one. If this provided version-number does not correspond to "the newest version-number of that document at the time of update" plus one, I will get a VersionConflict error. If it does correspond the document will be updated with the new one, so that "the newest version-number of that document" is NOW one higher than before the update. Correct but efficient concurrency handling.
      When my intent (see SOLR-3173) is to insert a new document, the version number provided will not be used - instead a version-number 0 will be used. According to SOLR-3173 insert will only succeed if a document with the same value on uniqueKey-field does not already exist.

      In general when talking about different versions of "the same document", of course we need to be able to identify when a document "is the same" - that, per definition, is when the values of the uniqueKey-fields are equal.

      The functionality provided by this issue is only really meaningfull when you run with "updateLog" activated.

      This issue might be solved more or less at the same time as SOLR-3173, and only one single SVN patch might be given to cover both issues.

      1. SOLR-3173_3178_3382_3428_plus.patch
        350 kB
        Per Steffensen
      2. SOLR-3173_3178_3382_3428_plus.patch
        335 kB
        Per Steffensen
      3. SOLR_3173_3178_3382_plus.patch
        300 kB
        Per Steffensen
      4. SOLR-3178.patch
        23 kB
        Yonik Seeley

        Issue Links

          Activity

          Hide
          Per Steffensen added a comment -

          More detailed descriptions of the added features in SOLR-3173 and SOLR-3178 here: http://wiki.apache.org/solr/Per%20Steffensen/Update%20semantics

          Show
          Per Steffensen added a comment - More detailed descriptions of the added features in SOLR-3173 and SOLR-3178 here: http://wiki.apache.org/solr/Per%20Steffensen/Update%20semantics
          Hide
          Yonik Seeley added a comment -

          One API issue is how to give error codes... looking at the HTTP list (http://en.wikipedia.org/wiki/List_of_HTTP_status_codes) there seem to be 2 that fit the bill:

          409 Conflict
          Indicates that the request could not be processed because of conflict in the request, such as an edit conflict.[2]

          412 Precondition Failed
          The server does not meet one of the preconditions that the requester put on the request.[2]

          Looks like 409 is closer though.

          Show
          Yonik Seeley added a comment - One API issue is how to give error codes... looking at the HTTP list ( http://en.wikipedia.org/wiki/List_of_HTTP_status_codes ) there seem to be 2 that fit the bill: 409 Conflict Indicates that the request could not be processed because of conflict in the request, such as an edit conflict. [2] 412 Precondition Failed The server does not meet one of the preconditions that the requester put on the request. [2] Looks like 409 is closer though.
          Hide
          Yonik Seeley added a comment -

          FYI, I've got a patch that seems to work fine... I'm just still finishing up some of the tests.

          Show
          Yonik Seeley added a comment - FYI, I've got a patch that seems to work fine... I'm just still finishing up some of the tests.
          Hide
          Per Steffensen added a comment -

          We have been working on it ourselves (SOLR-3173 and SOLR-3178), and will really prefer that we get to contribute our changes. Please have a look at http://wiki.apache.org/solr/Per%20Steffensen/Update%20semantics - review comments are very very welcome. We will be ready to contribute the entire solution within the next couple of weeks. It is already running pilot-production inhouse and nicely handling 100+ mio inserts/updates per day.
          We just decided to always use response code 400 (as you can see on the Wiki), but you suggestion about using other codes is good. I will hav a look at that.

          Regards, Per Steffensen

          Show
          Per Steffensen added a comment - We have been working on it ourselves ( SOLR-3173 and SOLR-3178 ), and will really prefer that we get to contribute our changes. Please have a look at http://wiki.apache.org/solr/Per%20Steffensen/Update%20semantics - review comments are very very welcome. We will be ready to contribute the entire solution within the next couple of weeks. It is already running pilot-production inhouse and nicely handling 100+ mio inserts/updates per day. We just decided to always use response code 400 (as you can see on the Wiki), but you suggestion about using other codes is good. I will hav a look at that. Regards, Per Steffensen
          Hide
          Per Steffensen added a comment -

          Changed the response codes to not always be 400. Comments and suggestions of changes are very welcome.

          Regards, Per Steffensen

          Show
          Per Steffensen added a comment - Changed the response codes to not always be 400. Comments and suggestions of changes are very welcome. Regards, Per Steffensen
          Hide
          Yonik Seeley added a comment -

          Here's a patch that seems to work fine.
          There's really no extra input-API - all you do is supply a version in the document (or as a parameter on the URL for a delete) and you get back a 409 if it's incorrect.

          I'll probably commit this soon and we can incrementally improve by adding better support to SolrJ (I'm not sure there's an easy way to specify the version for a delete from solrJ).

          Show
          Yonik Seeley added a comment - Here's a patch that seems to work fine. There's really no extra input-API - all you do is supply a version in the document (or as a parameter on the URL for a delete) and you get back a 409 if it's incorrect. I'll probably commit this soon and we can incrementally improve by adding better support to SolrJ (I'm not sure there's an easy way to specify the version for a delete from solrJ).
          Hide
          Per Steffensen added a comment -

          I will have a look at the patch provided by you and merge any ideas in there that is not already in our solution into our solution, but please do not commit your patch. I really hope you will wait until we provide our patch which implements everything described on http://wiki.apache.org/solr/Per%20Steffensen/Update%20semantics and more - e.g. SolrServers that return Future<NamedList<Object>> instead of just NamedList<Object>, which is needed to make the error propagation also described on http://wiki.apache.org/solr/Per%20Steffensen/Update%20semantics work for ConcurrentSolrServer. Everything in our contribution (the patch I deliver soon) will be fully covered by tests.

          Regards, Per Steffensen

          Show
          Per Steffensen added a comment - I will have a look at the patch provided by you and merge any ideas in there that is not already in our solution into our solution, but please do not commit your patch. I really hope you will wait until we provide our patch which implements everything described on http://wiki.apache.org/solr/Per%20Steffensen/Update%20semantics and more - e.g. SolrServers that return Future<NamedList<Object>> instead of just NamedList<Object>, which is needed to make the error propagation also described on http://wiki.apache.org/solr/Per%20Steffensen/Update%20semantics work for ConcurrentSolrServer. Everything in our contribution (the patch I deliver soon) will be fully covered by tests. Regards, Per Steffensen
          Hide
          Per Steffensen added a comment -

          I really hope you will give us a chance to contribute the solution to this SOLR-3178 and SOLR-3173. We have a solution that Im sure you will think is great - a solution that we will send to you within the next couple of weeks.

          Regards, Per Steffensen

          Show
          Per Steffensen added a comment - I really hope you will give us a chance to contribute the solution to this SOLR-3178 and SOLR-3173 . We have a solution that Im sure you will think is great - a solution that we will send to you within the next couple of weeks. Regards, Per Steffensen
          Hide
          Yonik Seeley added a comment -

          I think the best way to do this is really incrementally.
          What's done so far should really be a strict sub-set of everything you were thinking of (since changed to use 409 as the error code), and is useful on it's own.
          Don't worry... you'll get credit!

          Show
          Yonik Seeley added a comment - I think the best way to do this is really incrementally. What's done so far should really be a strict sub-set of everything you were thinking of (since changed to use 409 as the error code), and is useful on it's own. Don't worry... you'll get credit!
          Hide
          Per Steffensen added a comment - - edited

          I dont care about credit

          It is just that I put myself on as Assigned to indicate that I was working on it. And I have been - with a pause of a few weeks because my product owner prioritized me to look at something different (money talkes - you know ). I am soon ready to provide a full solution so why not wait for that. My solution is a little bit different than yours (just had a very short peek at your patch) but basically it is doing the same thing, but there is no need for both your and my solution. I really hope you will wait until you see my contribution - it is much more complete than what you have done until now, and we are already using it successfully in a highly concurrent pilot-production environment.

          My contribution solves at least a few problems that yours doesnt seem to solve (yet):

          • It (also) works for multi-document update request, providing fully detailed feedback to the client about which document-updates failed (and why) and which succeeded. I believe your solution will just result in a 409 HTTP response if just one of many documents in the same request failed, leaving no information to the user about which of the other documents in the request where successfully handled (and stored), and if more of the documents (comming after the on that did fail) would also have failed.
          • It works (have been tested by heavily concurrent tests) in highly concurrent environments. Looking shortly at you patch, I belive, that if two threads in the server JVM overlaps in a way that is unfortunate enough, then it is possible that data will not be stored or will be overwritten without an error being sent back to the client indicating that.
          • Feedback to client is even possible if using ConcurrentSolrServer, because it has been changed to provide a Future from "request" - a Future that will eventually provide the calling client with the result from the request.

          I understand your point about doing this incrementally, and I have done that internally. But since I am not a Solr committer it is hard to push too many small partly-done features. If I had been a committer you would have been able to follow the progress of my work more closely. But fact is, that I have a full and well tested solution that I would like to provide as a contribution very soon. I really hope you will wait for that, before committing a different partly done solution to the same problems. Thanks a lot!

          I would much rather want you to carefully read http://wiki.apache.org/solr/Per%20Steffensen/Update%20semantics and provide feedback and comments. I will deliver everything described there (and more) very soon. And then tell me how to ajust remaining time on this issue

          Regards, Per Steffensen

          Show
          Per Steffensen added a comment - - edited I dont care about credit It is just that I put myself on as Assigned to indicate that I was working on it. And I have been - with a pause of a few weeks because my product owner prioritized me to look at something different (money talkes - you know ). I am soon ready to provide a full solution so why not wait for that. My solution is a little bit different than yours (just had a very short peek at your patch) but basically it is doing the same thing, but there is no need for both your and my solution. I really hope you will wait until you see my contribution - it is much more complete than what you have done until now, and we are already using it successfully in a highly concurrent pilot-production environment. My contribution solves at least a few problems that yours doesnt seem to solve (yet): It (also) works for multi-document update request, providing fully detailed feedback to the client about which document-updates failed (and why) and which succeeded. I believe your solution will just result in a 409 HTTP response if just one of many documents in the same request failed, leaving no information to the user about which of the other documents in the request where successfully handled (and stored), and if more of the documents (comming after the on that did fail) would also have failed. It works (have been tested by heavily concurrent tests) in highly concurrent environments. Looking shortly at you patch, I belive, that if two threads in the server JVM overlaps in a way that is unfortunate enough, then it is possible that data will not be stored or will be overwritten without an error being sent back to the client indicating that. Feedback to client is even possible if using ConcurrentSolrServer, because it has been changed to provide a Future from "request" - a Future that will eventually provide the calling client with the result from the request. I understand your point about doing this incrementally, and I have done that internally. But since I am not a Solr committer it is hard to push too many small partly-done features. If I had been a committer you would have been able to follow the progress of my work more closely. But fact is, that I have a full and well tested solution that I would like to provide as a contribution very soon. I really hope you will wait for that, before committing a different partly done solution to the same problems. Thanks a lot! I would much rather want you to carefully read http://wiki.apache.org/solr/Per%20Steffensen/Update%20semantics and provide feedback and comments. I will deliver everything described there (and more) very soon. And then tell me how to ajust remaining time on this issue Regards, Per Steffensen
          Hide
          Yonik Seeley added a comment - - edited

          it is hard to push too many small partly-done features.

          As committers it's the opposite - it's sometimes hard to consume patches that do a lot of different things.

          It works with multi-document update request, providing fully detailed feedback to the client about which document-updates failed (and why) and which succeeded.

          That's great! But it's also a separate feature that we've needed for a while (and I think there has been a JIRA issue open for it for a while).

          Looking shortly at you patch, I belive, that if two threads in the server JVM overlaps in a way that is unfortunate enough, then it is possible that data will not be stored or will be overwritten without an exception being thrown to indicate that to the client.

          I'm confident in the synchronization/concurrency part - it's just reusing what was put in place for SolrCloud to handle reordering of updates to replicas and is very well tested (see TestRealTimeGet). But please let us know if you see a problem with it, as that would mean a problem with SolrCloud today (even without this patch).

          Feedback to client is even possible if using ConcurrentSolrServer, because it has been changed to provide a Future from "request" - a Future that will eventually provide the calling client with the result from the request.

          This sounds like another great feature! It really deserves it's own issue so people can more easily provide review/feedback and not have it coupled to this issue.

          Some other additions that can be handled later that I see:

          • SolrJ support for easier passing of version on a delete, constants for version, etc
          • Use of "1" as a generic "exists" version (i.e. update document only if it already exists)
          • If one document in a batch fails, don't automatically abort, and provide info back about which docs succeeded and which failed (your first point).

          That last one in particular needs some good discussion and design work and really deserves it's own issue.

          Down to the specifics of this patch... it's very non-invasive to core, consisting of the following code block (once for add, once for delete):

                      if (versionOnUpdate != 0) {
                        Long lastVersion = vinfo.lookupVersion(cmd.getIndexedId());
                        long foundVersion = lastVersion == null ? -1 : lastVersion;
                        if ( versionOnUpdate == foundVersion || (versionOnUpdate < 0 && foundVersion < 0) ) {
                          // we're ok if versions match, or if both are negative (all missing docs are equal)
                        } else {
                          throw new SolrException(ErrorCode.CONFLICT, "version conflict for " + cmd.getPrintableId() + " expected=" + versionOnUpdate + " actual=" + foundVersion);
                        }
                      }
          

          Having this current improvement committed in no way blocks any future improvements you may come up with (including deleting the code quoted above and using whatever method you have come up with for checking the versions), and it even uses the same API (or a subset of it) via version and 409. Progress, not perfection!

          Show
          Yonik Seeley added a comment - - edited it is hard to push too many small partly-done features. As committers it's the opposite - it's sometimes hard to consume patches that do a lot of different things. It works with multi-document update request, providing fully detailed feedback to the client about which document-updates failed (and why) and which succeeded. That's great! But it's also a separate feature that we've needed for a while (and I think there has been a JIRA issue open for it for a while). Looking shortly at you patch, I belive, that if two threads in the server JVM overlaps in a way that is unfortunate enough, then it is possible that data will not be stored or will be overwritten without an exception being thrown to indicate that to the client. I'm confident in the synchronization/concurrency part - it's just reusing what was put in place for SolrCloud to handle reordering of updates to replicas and is very well tested (see TestRealTimeGet). But please let us know if you see a problem with it, as that would mean a problem with SolrCloud today (even without this patch). Feedback to client is even possible if using ConcurrentSolrServer, because it has been changed to provide a Future from "request" - a Future that will eventually provide the calling client with the result from the request. This sounds like another great feature! It really deserves it's own issue so people can more easily provide review/feedback and not have it coupled to this issue. Some other additions that can be handled later that I see: SolrJ support for easier passing of version on a delete, constants for version , etc Use of "1" as a generic "exists" version (i.e. update document only if it already exists) If one document in a batch fails, don't automatically abort, and provide info back about which docs succeeded and which failed (your first point). That last one in particular needs some good discussion and design work and really deserves it's own issue. Down to the specifics of this patch... it's very non-invasive to core, consisting of the following code block (once for add, once for delete): if (versionOnUpdate != 0) { Long lastVersion = vinfo.lookupVersion(cmd.getIndexedId()); long foundVersion = lastVersion == null ? -1 : lastVersion; if ( versionOnUpdate == foundVersion || (versionOnUpdate < 0 && foundVersion < 0) ) { // we're ok if versions match, or if both are negative (all missing docs are equal) } else { throw new SolrException(ErrorCode.CONFLICT, "version conflict for " + cmd.getPrintableId() + " expected=" + versionOnUpdate + " actual=" + foundVersion); } } Having this current improvement committed in no way blocks any future improvements you may come up with (including deleting the code quoted above and using whatever method you have come up with for checking the versions), and it even uses the same API (or a subset of it) via version and 409. Progress, not perfection!
          Hide
          Yonik Seeley added a comment -

          Are there any technical objections to this patch?

          It really piggybacks on all of the SolrCloud work done to pass around and check versions, so it's really small.
          The API is as follows:

          • If you provide a version > 0 with an add or a delete, the update will fail with a 409 unless the provided version matches exactly the latest in the index for the ID.
          • If you provide a version < 0 with an add or delete, the update will fail with a 409 if the ID exists in the index.

          That's the whole scope of this patch, and I believe is compatible with the larger scope of what Per will provide in the future.

          Show
          Yonik Seeley added a comment - Are there any technical objections to this patch? It really piggybacks on all of the SolrCloud work done to pass around and check versions, so it's really small. The API is as follows: If you provide a version > 0 with an add or a delete, the update will fail with a 409 unless the provided version matches exactly the latest in the index for the ID. If you provide a version < 0 with an add or delete, the update will fail with a 409 if the ID exists in the index. That's the whole scope of this patch, and I believe is compatible with the larger scope of what Per will provide in the future.
          Hide
          Mark Miller added a comment -

          Per Steffensen, can you attach whatever you have, even if it's in rough form, so that the two approaches can be compared?

          Show
          Mark Miller added a comment - Per Steffensen, can you attach whatever you have, even if it's in rough form, so that the two approaches can be compared?
          Hide
          Per Steffensen added a comment -

          Mark Miller, we are working on a internal SVN branch where we started by talking in HEAD of apache solr trunk. From time to time we have merged new developments on apache solr trunk into our SVN branch. Right now we are in the process of doing that again, so that the patch we deliver will be a patch on top of a recent apache solr revision. Until that merge has finished I cannot attach our modifications unless you want them to fit on top of a pretty old (a month or so) revision of apache solr.
          My intention was to deliver nicely finished (completely tested, documented etc) features to you, but based on what I have heard from you and Yonik the last couple of days, I will change my strategy and deliver a rough patch as soon as the merge has finished, and then leave the final polishing for a more complete patch a little bit later. I hope to be able to deliver my rough patch in the middle of next week.
          Thanks a lot for you cooperation!

          Regards, Per Steffensen

          Show
          Per Steffensen added a comment - Mark Miller, we are working on a internal SVN branch where we started by talking in HEAD of apache solr trunk. From time to time we have merged new developments on apache solr trunk into our SVN branch. Right now we are in the process of doing that again, so that the patch we deliver will be a patch on top of a recent apache solr revision. Until that merge has finished I cannot attach our modifications unless you want them to fit on top of a pretty old (a month or so) revision of apache solr. My intention was to deliver nicely finished (completely tested, documented etc) features to you, but based on what I have heard from you and Yonik the last couple of days, I will change my strategy and deliver a rough patch as soon as the merge has finished, and then leave the final polishing for a more complete patch a little bit later. I hope to be able to deliver my rough patch in the middle of next week. Thanks a lot for you cooperation! Regards, Per Steffensen
          Hide
          Per Steffensen added a comment -

          Yonik Seeley, I agree that some of the stuff that I have added deserves own issues, but it doesnt change the fact that we hope to contribute them all in one go. I understand that it would have been nicer if we had sent you many smaller contributions instead one big contribution. We will learn from that and do that for future contributions. I hope this time that you (the committers) will accept a big contribution.

          I have created a separate issue for one of the things that is also going to be in our contribution - SOLR-3382

          Show
          Per Steffensen added a comment - Yonik Seeley, I agree that some of the stuff that I have added deserves own issues, but it doesnt change the fact that we hope to contribute them all in one go. I understand that it would have been nicer if we had sent you many smaller contributions instead one big contribution. We will learn from that and do that for future contributions. I hope this time that you (the committers) will accept a big contribution. I have created a separate issue for one of the things that is also going to be in our contribution - SOLR-3382
          Hide
          Per Steffensen added a comment -
          Show
          Per Steffensen added a comment - And SOLR-3383
          Hide
          Per Steffensen added a comment -

          And SOLR-3384, which is kinda independent but still cool. Im not sure we will deal with this issue for now though

          Show
          Per Steffensen added a comment - And SOLR-3384 , which is kinda independent but still cool. Im not sure we will deal with this issue for now though
          Hide
          Mark Miller added a comment -

          I will change my strategy and deliver a rough patch as soon as the merge has finished

          +1 - the Apache model likes dev to happen in the open as much as possible in piecemeal - it's more digestible, and helps ensure a lot of work is not done in a direction that others may not end up agreeing with. You don't want to dump a ton of finely polished work that has been fully documented and then find the community has other intentions.

          Even a patch that does not compile has a lot of value! Just specify it's early state and limitations - but it lets us know the general direction you are taking, allows early feedback, and helps any committers digest the work.

          Show
          Mark Miller added a comment - I will change my strategy and deliver a rough patch as soon as the merge has finished +1 - the Apache model likes dev to happen in the open as much as possible in piecemeal - it's more digestible, and helps ensure a lot of work is not done in a direction that others may not end up agreeing with. You don't want to dump a ton of finely polished work that has been fully documented and then find the community has other intentions. Even a patch that does not compile has a lot of value! Just specify it's early state and limitations - but it lets us know the general direction you are taking, allows early feedback, and helps any committers digest the work.
          Hide
          Per Steffensen added a comment -

          Mark Miller, its noted and I will certainly try to ajust behaviour for future work on Solr. Thanks for helping me understand the Apache process better.

          Regards, Per Steffensen

          Show
          Per Steffensen added a comment - Mark Miller, its noted and I will certainly try to ajust behaviour for future work on Solr. Thanks for helping me understand the Apache process better. Regards, Per Steffensen
          Hide
          Yonik Seeley added a comment -

          I will change my strategy and deliver a rough patch as soon as the merge has finished

          That's great, I'm sure you'll find more success with breaking it up into many smaller issues - different committers will be able to help on different things as their interest & time allow.

          I'd really like to try and get the bare essentials of optimistic locking in soon, so the sooner you can show us a draft patch for just that piece so we can decide what approach to go with, the better! There's no need to wait to merge up to trunk to show us how your approach differs.

          Show
          Yonik Seeley added a comment - I will change my strategy and deliver a rough patch as soon as the merge has finished That's great, I'm sure you'll find more success with breaking it up into many smaller issues - different committers will be able to help on different things as their interest & time allow. I'd really like to try and get the bare essentials of optimistic locking in soon, so the sooner you can show us a draft patch for just that piece so we can decide what approach to go with, the better! There's no need to wait to merge up to trunk to show us how your approach differs.
          Hide
          Per Steffensen added a comment -

          Will supply patch early next week.

          Guess I have broken (or will break it) the following "rule" on "How to contribute":
          "Combine multiple issues into a single patch, especially if they are unrelated or only loosely related."

          I have added the following to "How to contribute | Contributing your work" in order to state the philosophy described by Mark a little more clearly to future newbie contributers. Feel free to modify:
          "Supply first patch as early as possible and updated patches as often as possible during your work. This helps the rest of the community and committers to start understanding, help shaping, commenting on etc. your work throughout the entire process. Supplying a patch does not necessarily mean that it is complete and ready to be committed, it might also just be a way of communicating your idea and progress."

          Regards and have a nice weekend

          Show
          Per Steffensen added a comment - Will supply patch early next week. Guess I have broken (or will break it) the following "rule" on "How to contribute": "Combine multiple issues into a single patch, especially if they are unrelated or only loosely related." I have added the following to "How to contribute | Contributing your work" in order to state the philosophy described by Mark a little more clearly to future newbie contributers. Feel free to modify: "Supply first patch as early as possible and updated patches as often as possible during your work. This helps the rest of the community and committers to start understanding, help shaping, commenting on etc. your work throughout the entire process. Supplying a patch does not necessarily mean that it is complete and ready to be committed, it might also just be a way of communicating your idea and progress." Regards and have a nice weekend
          Hide
          Per Steffensen added a comment -

          it is hard to push too many small partly-done features.

          As committers it's the opposite - it's sometimes hard to consume patches that do a lot of different things.

          I could imagine. Im sorry that it is possibly going to be a little bit like that this time. But I believe I will provide great stuff.

          It works with multi-document update request, providing fully detailed feedback to the client about which document-updates failed (and why) and which succeeded.

          That's great! But it's also a separate feature that we've needed for a while (and I think there has been a JIRA issue open for it for a while).

          Yes I agree. See SOLR-3382. The solution I will provide is generic in the way that is supports giving feedback to the client about "parts" of a request that failed. The client will know that the "parts" which are not reported to have failed did succeed. Usefull for any kind of request where parts of it can fail and other parts can succeed. The client provide a "partRef" for all parts, and partial errors from the server each contain a "partRef" in order to let the client know which part(s) failed. I only use it for multi-document (or batch-updates as I see you call it in come places) updates (here each document is a "part"), but I guess it could also be used for other things like

          • Reporting which operations of a multi-operation (e.g. one containing both adds, deletes, commits, etc) update that failed.
          • Reporting if a single update partly failed (e.g. if using replication and the update was successfull on leader but not on one of more of the replicas)
          • Etc.

          Looking shortly at you patch, I belive, that if two threads in the server JVM overlaps in a way that is unfortunate enough, then it is possible that data will not be stored or will be overwritten without an exception being thrown to indicate that to the client.

          I'm confident in the synchronization/concurrency part - it's just reusing what was put in place for SolrCloud to handle reordering of updates to replicas and is very well tested (see TestRealTimeGet). But please let us know if you see a problem with it, as that would mean a problem with SolrCloud today (even without this patch).

          Looking closer at your code I would like to withdraw my concern about your solution. I didnt see that you are doing both "get existing version and check against provided version" and "store updated version" inside the same block governed by two sync-locks

          • A readlock on a ReadWriteLock which is writelocked during deleteByQueries (your "vinfo.lockForUpdate"), protecting against concurent deletes (by query)
          • A lock on the uniqueKey field (or a hash of it in order to be efficient) of the document (your "synchronized (bucket)"), protecting against concurrent updates and deletes (by id)

          So basically I have repeated that locking-pattern in DirectUpdateHandler2 where I chose to place the "unique key constaint"- and "versioning"- check. Guess my locking can be removed if it is really necessary to have this kind of locking "this far up in the callstack" (in DistributedUpdateProcessor.versionAdd). For the purpose of "unique key constraint"- and "versioning"-check I believe it is only necessary to have this kind of locking around the actual updates in DirectUpdateHandler2 (when leader and not peer_sync) - and the smaller you can make synchronized blocks the better. But maybe you have reasons (related to SolrCloud) where you need the locking "this far up in the callstack".
          But besides that fact that the locking I did in DirectUpdateHandler2 can be removed I really believe that the actual checking of "unique key constraint"- and "version"-violation logically belong in DirectUpdateHandler2, so I guess I will still try to convince you to take that part of my patch in.
          Besides that, I believe your solution just implements what I call semantics-mode "classic-consistency-hybrid" on http://wiki.apache.org/solr/Per%20Steffensen/Update%20semantics. I believe the "classic" mode and "consistency" mode are very nice to have in order to provide complete backward compabitility (classic mode) and a very strict consistency mode where it is not possible to "break the checks" by providing _version_ = 0 (consistency mode).
          Finally I believe my solution is a little better with respect to "design" and "separation of concerns", by isolatating the rules and rule-checks in a separate class instead of just coding it directly inside DistributedUpdateProcessor which already deals with a lot of other concerns, but that is another discussion, and should not count as the deciding argument.

          Feedback to client is even possible if using ConcurrentSolrServer, because it has been changed to provide a Future from "request" - a Future that will eventually provide the calling client with the result from the request.

          This sounds like another great feature! It really deserves it's own issue so people can more easily provide review/feedback and not have it coupled to this issue.

          Created the isolated issue - see SOLR-3383

          Some other additions that can be handled later that I see:

          SolrJ support for easier passing of version on a delete, constants for version, etc

          Well basically I have moved VERSION_FIELD = "_version_" from VersionInfo to SolrInputDocument.

          Use of "1" as a generic "exists" version (i.e. update document only if it already exists)

          Well I started out by wanting that feature (update only if exists, but no exact version check), but you couldnt see the good reason for that. Since then I realized that you are probably right and have removed that exact possibility - of course in classic and classic-consistency-hybrid modes you can still do updates without providing the exact version, but there is error if the document does not exist.

          If one document in a batch fails, don't automatically abort, and provide info back about which docs succeeded and which failed (your first point).

          That is part of my solution - basically the thing making it necessary to add the ability to send many errors back to the client - SOLR-3382

          That last one in particular needs some good discussion and design work and really deserves it's own issue.

          Agree. Go discuss and comment on SOLR-3382

          Having this current improvement committed in no way blocks any future improvements you may come up with (including deleting the code quoted above and using whatever method you have come up with for checking the versions), and it even uses the same API (or a subset of it) via version and 409. Progress, not perfection!

          Agree, so I guess you could go commit, but that would just make my merge-task a little bigger, so I hope you will hold your horses a few days more.

          Are there any technical objections to this patch?

          Except for the things I mention here and there in this response, no. And those are all things that could be fixed later, going nicely hand in hand with your great philosophy of "progress, not perfection".

          It really piggybacks on all of the SolrCloud work done to pass around and check versions, so it's really small.

          Yes it is small and beautiful

          The API is as follows:

          a) If you provide a version > 0 with an add or a delete, the update will fail with a 409 unless the provided version matches exactly the latest in the index for the ID.
          b) If you provide a version < 0 with an add or delete, the update will fail with a 409 if the ID exists in the index.

          You always provide the same error (except the versions in the message). You do not distinguish between DocumentAlreadyExist, DocumentDoesNotExist and VersionConflict errors as I do. Of course the client would know that it is a DocumentAlreadyExists if he sent a negative version number and got 409 (at least as long as 409 is not used for other kind of conflicts that can also happen when version < 0) and that it is DocumentDoesNotExist or VersionConflict if he sent a positive version number and got 409 (at least ...). But he really doesnt know if it is DocumentDoesNotExist or VersionConflict unless he parses the error message, and that might be important to know because he might want to react differently - e.g.

          • On DocumentDoesNotExist: Either do nothing or go do a create (version < 0) of the document
          • On VersionConflict: Go refetch the document from Solr, merge in his changes (automatically or by the help from a user) and try update again.

          Of course you also need to parse error-type in my solution to know exactly what the problem is, but you need no "rules" on client side (e.g. knowing that if you sent version > 0 and you got version-already-in-store < 0 back, it is DocumentDoesNotExist), and in SolrJ my solution automatically parses error-type and throw corresponding exceptions.

          Ad b) Maybe you dont want an error if you try to delete a document which has already been deleted, but probably ok

          That's the whole scope of this patch, and I believe is compatible with the larger scope of what Per will provide in the future.

          Agree!

          Show
          Per Steffensen added a comment - it is hard to push too many small partly-done features. As committers it's the opposite - it's sometimes hard to consume patches that do a lot of different things. I could imagine. Im sorry that it is possibly going to be a little bit like that this time. But I believe I will provide great stuff. It works with multi-document update request, providing fully detailed feedback to the client about which document-updates failed (and why) and which succeeded. That's great! But it's also a separate feature that we've needed for a while (and I think there has been a JIRA issue open for it for a while). Yes I agree. See SOLR-3382 . The solution I will provide is generic in the way that is supports giving feedback to the client about "parts" of a request that failed. The client will know that the "parts" which are not reported to have failed did succeed. Usefull for any kind of request where parts of it can fail and other parts can succeed. The client provide a "partRef" for all parts, and partial errors from the server each contain a "partRef" in order to let the client know which part(s) failed. I only use it for multi-document (or batch-updates as I see you call it in come places) updates (here each document is a "part"), but I guess it could also be used for other things like Reporting which operations of a multi-operation (e.g. one containing both adds, deletes, commits, etc) update that failed. Reporting if a single update partly failed (e.g. if using replication and the update was successfull on leader but not on one of more of the replicas) Etc. Looking shortly at you patch, I belive, that if two threads in the server JVM overlaps in a way that is unfortunate enough, then it is possible that data will not be stored or will be overwritten without an exception being thrown to indicate that to the client. I'm confident in the synchronization/concurrency part - it's just reusing what was put in place for SolrCloud to handle reordering of updates to replicas and is very well tested (see TestRealTimeGet). But please let us know if you see a problem with it, as that would mean a problem with SolrCloud today (even without this patch). Looking closer at your code I would like to withdraw my concern about your solution. I didnt see that you are doing both "get existing version and check against provided version" and "store updated version" inside the same block governed by two sync-locks A readlock on a ReadWriteLock which is writelocked during deleteByQueries (your "vinfo.lockForUpdate"), protecting against concurent deletes (by query) A lock on the uniqueKey field (or a hash of it in order to be efficient) of the document (your "synchronized (bucket)"), protecting against concurrent updates and deletes (by id) So basically I have repeated that locking-pattern in DirectUpdateHandler2 where I chose to place the "unique key constaint"- and "versioning"- check. Guess my locking can be removed if it is really necessary to have this kind of locking "this far up in the callstack" (in DistributedUpdateProcessor.versionAdd). For the purpose of "unique key constraint"- and "versioning"-check I believe it is only necessary to have this kind of locking around the actual updates in DirectUpdateHandler2 (when leader and not peer_sync) - and the smaller you can make synchronized blocks the better. But maybe you have reasons (related to SolrCloud) where you need the locking "this far up in the callstack". But besides that fact that the locking I did in DirectUpdateHandler2 can be removed I really believe that the actual checking of "unique key constraint"- and "version"-violation logically belong in DirectUpdateHandler2, so I guess I will still try to convince you to take that part of my patch in. Besides that, I believe your solution just implements what I call semantics-mode "classic-consistency-hybrid" on http://wiki.apache.org/solr/Per%20Steffensen/Update%20semantics . I believe the "classic" mode and "consistency" mode are very nice to have in order to provide complete backward compabitility (classic mode) and a very strict consistency mode where it is not possible to "break the checks" by providing _version_ = 0 (consistency mode). Finally I believe my solution is a little better with respect to "design" and "separation of concerns", by isolatating the rules and rule-checks in a separate class instead of just coding it directly inside DistributedUpdateProcessor which already deals with a lot of other concerns, but that is another discussion, and should not count as the deciding argument. Feedback to client is even possible if using ConcurrentSolrServer, because it has been changed to provide a Future from "request" - a Future that will eventually provide the calling client with the result from the request. This sounds like another great feature! It really deserves it's own issue so people can more easily provide review/feedback and not have it coupled to this issue. Created the isolated issue - see SOLR-3383 Some other additions that can be handled later that I see: SolrJ support for easier passing of version on a delete, constants for version, etc Well basically I have moved VERSION_FIELD = "_version_" from VersionInfo to SolrInputDocument. Use of "1" as a generic "exists" version (i.e. update document only if it already exists) Well I started out by wanting that feature (update only if exists, but no exact version check), but you couldnt see the good reason for that. Since then I realized that you are probably right and have removed that exact possibility - of course in classic and classic-consistency-hybrid modes you can still do updates without providing the exact version, but there is error if the document does not exist. If one document in a batch fails, don't automatically abort, and provide info back about which docs succeeded and which failed (your first point). That is part of my solution - basically the thing making it necessary to add the ability to send many errors back to the client - SOLR-3382 That last one in particular needs some good discussion and design work and really deserves it's own issue. Agree. Go discuss and comment on SOLR-3382 Having this current improvement committed in no way blocks any future improvements you may come up with (including deleting the code quoted above and using whatever method you have come up with for checking the versions), and it even uses the same API (or a subset of it) via version and 409. Progress, not perfection! Agree, so I guess you could go commit, but that would just make my merge-task a little bigger, so I hope you will hold your horses a few days more. Are there any technical objections to this patch? Except for the things I mention here and there in this response, no. And those are all things that could be fixed later, going nicely hand in hand with your great philosophy of "progress, not perfection". It really piggybacks on all of the SolrCloud work done to pass around and check versions, so it's really small. Yes it is small and beautiful The API is as follows: a) If you provide a version > 0 with an add or a delete, the update will fail with a 409 unless the provided version matches exactly the latest in the index for the ID. b) If you provide a version < 0 with an add or delete, the update will fail with a 409 if the ID exists in the index. You always provide the same error (except the versions in the message). You do not distinguish between DocumentAlreadyExist, DocumentDoesNotExist and VersionConflict errors as I do. Of course the client would know that it is a DocumentAlreadyExists if he sent a negative version number and got 409 (at least as long as 409 is not used for other kind of conflicts that can also happen when version < 0) and that it is DocumentDoesNotExist or VersionConflict if he sent a positive version number and got 409 (at least ...). But he really doesnt know if it is DocumentDoesNotExist or VersionConflict unless he parses the error message, and that might be important to know because he might want to react differently - e.g. On DocumentDoesNotExist: Either do nothing or go do a create (version < 0) of the document On VersionConflict: Go refetch the document from Solr, merge in his changes (automatically or by the help from a user) and try update again. Of course you also need to parse error-type in my solution to know exactly what the problem is, but you need no "rules" on client side (e.g. knowing that if you sent version > 0 and you got version-already-in-store < 0 back, it is DocumentDoesNotExist), and in SolrJ my solution automatically parses error-type and throw corresponding exceptions. Ad b) Maybe you dont want an error if you try to delete a document which has already been deleted, but probably ok That's the whole scope of this patch, and I believe is compatible with the larger scope of what Per will provide in the future. Agree!
          Hide
          Per Steffensen added a comment - - edited

          Find attached SOLR_3173_3178_3382_plus.patch which should fit on top of revision 1327417.

          Patch includes:

          • All our code-changes done during the work with SOLR-3173, SOLR-3178 and SOLR-3382. How it is supposed to work in more detail, is described on http://wiki.apache.org/solr/Per%20Steffensen/Update%20semantics and is of course also expressed in included tests.
          • Also includes
            • Misc clean-up
              • E.g. removing constants redundant OVERWRITE = "overwrite" so that only UpdateParams.OVERWRITE is left. You need very good arguments to ever use different "names" for the same thing (overwrite) among XML, JSON, HTTP request params etc, so it is kind of wrong to have the constant defined for each of those.
            • "Unimportant changes" - e.g.
              • Corrected spelling errors
              • Removed unnecessary imports

          Not implemented yet

          • Error propagation for redistributed updates. That is, the responses from the redistributed updates in DistributedUpdateProcessor using SolrCmdDistributor, have to be collected an merged into a combined response from "this" DistributedUpdateProcessor.
            • Both implementation and tests are missing.
          • Tests verifying that updates using JSON and CSV requests as described on http://wiki.apache.org/solr/Per%20Steffensen/Update%20semantics also works as described. Especially transfer of partRefs in JSON and CSV requests.
          • Tests verifying that semanticsMode "consistency" works as it is supposed to - just like ClassicConsistencyHybridXXXTests tests this for "classic-consistency-hybrid" semanticsMode and like ClassicUpdateSemanticsTest shortly tests it when semanticsMode is explicitly set to "classic" ("classic" is the default so this mode is already tested a lot using all the others tests in the project)

          Other stuff not done yet

          • Add Apache 2.0 Licence to all new files
          • Check correct indenting
          • JavaDoc for public methods
          Show
          Per Steffensen added a comment - - edited Find attached SOLR_3173_3178_3382_plus.patch which should fit on top of revision 1327417. Patch includes: All our code-changes done during the work with SOLR-3173 , SOLR-3178 and SOLR-3382 . How it is supposed to work in more detail, is described on http://wiki.apache.org/solr/Per%20Steffensen/Update%20semantics and is of course also expressed in included tests. Also includes Misc clean-up E.g. removing constants redundant OVERWRITE = "overwrite" so that only UpdateParams.OVERWRITE is left. You need very good arguments to ever use different "names" for the same thing (overwrite) among XML, JSON, HTTP request params etc, so it is kind of wrong to have the constant defined for each of those. "Unimportant changes" - e.g. Corrected spelling errors Removed unnecessary imports Not implemented yet Error propagation for redistributed updates. That is, the responses from the redistributed updates in DistributedUpdateProcessor using SolrCmdDistributor, have to be collected an merged into a combined response from "this" DistributedUpdateProcessor. Both implementation and tests are missing. Tests verifying that updates using JSON and CSV requests as described on http://wiki.apache.org/solr/Per%20Steffensen/Update%20semantics also works as described. Especially transfer of partRefs in JSON and CSV requests. Tests verifying that semanticsMode "consistency" works as it is supposed to - just like ClassicConsistencyHybridXXXTests tests this for "classic-consistency-hybrid" semanticsMode and like ClassicUpdateSemanticsTest shortly tests it when semanticsMode is explicitly set to "classic" ("classic" is the default so this mode is already tested a lot using all the others tests in the project) Other stuff not done yet Add Apache 2.0 Licence to all new files Check correct indenting JavaDoc for public methods
          Hide
          Yonik Seeley added a comment -

          Thanks for the patch Per!

          I think I agree it would be nice to have more of the functionality consolidated to the update handler and not do quite so much "up the callstack"... of course the downside to this would be the context that the update processors have (namely they know when a request, which can contain multiple updates, starts and finishes).

          Regardless - we are where we are right now... and doing the version check in the update handler presents some problems. For cloud, we want to start sending to replicas at the same time as indexing locally. We also want to check the version before sending to replicas since it would be easy to get into a scenario where a conditional update fails on some replicas and succeeds on others (due to the fact that updates can easily be seen by different replicas in a different order).

          So I think the best way forward is to commit the patch that does the version check in the distributed update processor, and then build off of that (for instance your more nuanced error messages and the set/get requestVersion on the update command).

          Show
          Yonik Seeley added a comment - Thanks for the patch Per! I think I agree it would be nice to have more of the functionality consolidated to the update handler and not do quite so much "up the callstack"... of course the downside to this would be the context that the update processors have (namely they know when a request, which can contain multiple updates, starts and finishes). Regardless - we are where we are right now... and doing the version check in the update handler presents some problems. For cloud, we want to start sending to replicas at the same time as indexing locally. We also want to check the version before sending to replicas since it would be easy to get into a scenario where a conditional update fails on some replicas and succeeds on others (due to the fact that updates can easily be seen by different replicas in a different order). So I think the best way forward is to commit the patch that does the version check in the distributed update processor, and then build off of that (for instance your more nuanced error messages and the set/get requestVersion on the update command).
          Hide
          Per Steffensen added a comment - - edited

          First of all. Thanks a lot for getting back to me, and taking the time to try to understand such a big patch covering several jira issues - Im still sorry about that.

          I think I agree it would be nice to have more of the functionality consolidated to the update handler and not do quite so much "up the callstack"... of course the downside to this would be the context that the update processors have (namely they know when a request, which can contain multiple updates, starts and finishes).

          I agree and understand about the different context, but failing/succeeding with PartialError (VersionConflict, DocumentAlreadyExists, DocumentDoesNotExist or WrongUsage) is a per-document-to-be-updated thing. Therefore I believe it does not matter in this case, or acutally, that might just be the argument that it belongs in UpdateHandler. Certainly it can be done in UpdateHandler.

          Regardless - we are where we are right now... and doing the version check in the update handler presents some problems. For cloud, we want to start sending to replicas at the same time as indexing locally.

          Im not sure I understand. As I read the code, we are not doing that as it is today. In DistributedUpdateProcessor.processAdd you first do the local add (as the leader) in versionAdd AND THEN, if you succeed, forwards to all replica (in the if (nodes <> null) block). I believe it is a good strategy to first do update on leader and then, if success on leader, do update on replica concurrently (and asynchronously). So stick with that!

          So doing version-check etc (when leader) in DirectUpdateHandler2 does not represent a problem. Errors occuring as a result of version-check etc should just (as any other error) result in update not forwarded to replica.

          One problem in my patch is that DirectUpdateHandler2 does not notify (by throwing error or making sure dropCmd is true) DistributedUpdateProcessor to not send to replica in case of PartialError (VersionConflict, DocumentAlreadyExists, DOcumentDoesNotExists or WrongUsage). It will in the next upcomming patch.

          We also want to check the version before sending to replicas since it would be easy to get into a scenario where a conditional update fails on some replicas and succeeds on others (due to the fact that updates can easily be seen by different replicas in a different order).

          Agree about check version on leader before sending to replica. But that can equally well be done in "DistributedUpdateProcessor.versionAdd" (inside the locks) as you do, as it can be done in "DirectUpdateHandler2.addDoc" as I do. So with respect to that it doesnt matter if you take your patch or mine. Of course when I change my patch so that version-check etc errors in DirectUpdateHandler2 actually ensures that the update is not sent to replica.

          The idea is that version-checks and uniqueKey-already-exists-checks etc, leading to PartialError, are only performed on leader, so replica will not fail with this kind of errors - and they shouldnt. Please note "cmd.isLeaderLogic()" in DirectUpdateHandler2.addDoc in the line

          boolean getAndCheckAgainstExisting = semanticsMode.needToGetAndCheckAgainstExistingDocument(cmd) && cmd.isLeaderLogic();
          

          Replica are supposed to be "dumb" in the way that they just index (without any checking) what was successfully indexed on the leader (and therefore forwarded to replica in the first place), and support getting updates in different orders by not indexing updates if they are "older" than an update they have already indexed.

          So I think the best way forward is to commit the patch that does the version check in the distributed update processor, and then build off of that (for instance your more nuanced error messages and the set/get requestVersion on the update command).

          I have to say that I do not agree. Believe I have argued why the problems you mention with my patch/approach is not true (when I make the small fix mentioned). Sure your patch is a small step forward (progress, not perfection), but building on top of that will put me in a situation where I have to make changes to my patch in order to get all the extra bennefits it provide (much more progresss, not perfection ). Of course you can commit your patch, but I still believe the code in that patch should be deleted when adding my patch, and I still hope that will be the end result.

          I will provide patch soon with the following changes

          • PartialError in DirectUpdateHandler2 on leader will make DistributedUpdateProcessor not forward to replica (as with any other error on leader)
          • Error propagation will work when client contact server that is not the leader of the document to be updated, and therefore forwards to leader, and therefore also have to propagate errors from that leader back to client. His has always worked for single errors, where you just abort the entire update on the first error, leaving no information to the client about what failed and what did not, but doesnt work now that you can have partial errors where some documents fail and other succeeds (on different servers) and you want to provide a detailed picture about it back to the client.
          • Cleanup of a few things in the patch that should never have been included (survivals from early attempts rolled back on my side)
          Show
          Per Steffensen added a comment - - edited First of all. Thanks a lot for getting back to me, and taking the time to try to understand such a big patch covering several jira issues - Im still sorry about that. I think I agree it would be nice to have more of the functionality consolidated to the update handler and not do quite so much "up the callstack"... of course the downside to this would be the context that the update processors have (namely they know when a request, which can contain multiple updates, starts and finishes). I agree and understand about the different context, but failing/succeeding with PartialError (VersionConflict, DocumentAlreadyExists, DocumentDoesNotExist or WrongUsage) is a per-document-to-be-updated thing. Therefore I believe it does not matter in this case, or acutally, that might just be the argument that it belongs in UpdateHandler. Certainly it can be done in UpdateHandler. Regardless - we are where we are right now... and doing the version check in the update handler presents some problems. For cloud, we want to start sending to replicas at the same time as indexing locally. Im not sure I understand. As I read the code, we are not doing that as it is today. In DistributedUpdateProcessor.processAdd you first do the local add (as the leader) in versionAdd AND THEN, if you succeed, forwards to all replica (in the if (nodes <> null) block). I believe it is a good strategy to first do update on leader and then, if success on leader, do update on replica concurrently (and asynchronously). So stick with that! So doing version-check etc (when leader) in DirectUpdateHandler2 does not represent a problem. Errors occuring as a result of version-check etc should just (as any other error) result in update not forwarded to replica. One problem in my patch is that DirectUpdateHandler2 does not notify (by throwing error or making sure dropCmd is true) DistributedUpdateProcessor to not send to replica in case of PartialError (VersionConflict, DocumentAlreadyExists, DOcumentDoesNotExists or WrongUsage). It will in the next upcomming patch. We also want to check the version before sending to replicas since it would be easy to get into a scenario where a conditional update fails on some replicas and succeeds on others (due to the fact that updates can easily be seen by different replicas in a different order). Agree about check version on leader before sending to replica. But that can equally well be done in "DistributedUpdateProcessor.versionAdd" (inside the locks) as you do, as it can be done in "DirectUpdateHandler2.addDoc" as I do. So with respect to that it doesnt matter if you take your patch or mine. Of course when I change my patch so that version-check etc errors in DirectUpdateHandler2 actually ensures that the update is not sent to replica. The idea is that version-checks and uniqueKey-already-exists-checks etc, leading to PartialError, are only performed on leader, so replica will not fail with this kind of errors - and they shouldnt. Please note "cmd.isLeaderLogic()" in DirectUpdateHandler2.addDoc in the line boolean getAndCheckAgainstExisting = semanticsMode.needToGetAndCheckAgainstExistingDocument(cmd) && cmd.isLeaderLogic(); Replica are supposed to be "dumb" in the way that they just index (without any checking) what was successfully indexed on the leader (and therefore forwarded to replica in the first place), and support getting updates in different orders by not indexing updates if they are "older" than an update they have already indexed. So I think the best way forward is to commit the patch that does the version check in the distributed update processor, and then build off of that (for instance your more nuanced error messages and the set/get requestVersion on the update command). I have to say that I do not agree. Believe I have argued why the problems you mention with my patch/approach is not true (when I make the small fix mentioned). Sure your patch is a small step forward (progress, not perfection), but building on top of that will put me in a situation where I have to make changes to my patch in order to get all the extra bennefits it provide (much more progresss, not perfection ). Of course you can commit your patch, but I still believe the code in that patch should be deleted when adding my patch, and I still hope that will be the end result. I will provide patch soon with the following changes PartialError in DirectUpdateHandler2 on leader will make DistributedUpdateProcessor not forward to replica (as with any other error on leader) Error propagation will work when client contact server that is not the leader of the document to be updated, and therefore forwards to leader, and therefore also have to propagate errors from that leader back to client. His has always worked for single errors, where you just abort the entire update on the first error, leaving no information to the client about what failed and what did not, but doesnt work now that you can have partial errors where some documents fail and other succeeds (on different servers) and you want to provide a detailed picture about it back to the client. Cleanup of a few things in the patch that should never have been included (survivals from early attempts rolled back on my side)
          Hide
          Yonik Seeley added a comment -

          > For cloud, we want to start sending to replicas at the same time as indexing locally.

          As I read the code, we are not doing that as it is today.

          Right - I just realized the ambiguity of the phrasing. By "we want to start", I meant that we want to investigate/implement that option (to send to replicas concurrently with indexing locally).

          Show
          Yonik Seeley added a comment - > For cloud, we want to start sending to replicas at the same time as indexing locally. As I read the code, we are not doing that as it is today. Right - I just realized the ambiguity of the phrasing. By "we want to start", I meant that we want to investigate/implement that option (to send to replicas concurrently with indexing locally).
          Hide
          Mark Miller added a comment -

          Yeah, it would be nice to offer a wide range of indexing options - I'd really like to see us offer down to "fire and forget" - the add waits for no response and adds are done locally and remotely in parallel. AFAIK MongoDB actually does this by default - I don't think we should do that, but I'm interested in offering a range from that up to full guarantees.

          Show
          Mark Miller added a comment - Yeah, it would be nice to offer a wide range of indexing options - I'd really like to see us offer down to "fire and forget" - the add waits for no response and adds are done locally and remotely in parallel. AFAIK MongoDB actually does this by default - I don't think we should do that, but I'm interested in offering a range from that up to full guarantees.
          Hide
          Per Steffensen added a comment -

          Agree, people have different preferences on how to use Solr, and therefore a.o. how the exact semantics of update should be. My solution is also better prepared for that, since I offer the "semanticsMode" property in updatehandler (DirectUpdateHandler2) where you can state if you want strict versioning ("consistency" mode) or the good old semantics ("classic" mode) or a mix ("classic-consistency-mode" mode). Guess we will always need to do leader-first-then-replica I cases where we want to do strict versioning control, but in "classic" mode (or a new future mode) we could certainly consider doing leader-and-replica-in-parallel. Will provide new patch today or tomorrow.

          Show
          Per Steffensen added a comment - Agree, people have different preferences on how to use Solr, and therefore a.o. how the exact semantics of update should be. My solution is also better prepared for that, since I offer the "semanticsMode" property in updatehandler (DirectUpdateHandler2) where you can state if you want strict versioning ("consistency" mode) or the good old semantics ("classic" mode) or a mix ("classic-consistency-mode" mode). Guess we will always need to do leader-first-then-replica I cases where we want to do strict versioning control, but in "classic" mode (or a new future mode) we could certainly consider doing leader-and-replica-in-parallel. Will provide new patch today or tomorrow.
          Hide
          Per Steffensen added a comment - - edited

          "Fire and forget" will also be much cooler with a real async client interface. We will provide that very soon - see SOLR-3383.

          Show
          Per Steffensen added a comment - - edited "Fire and forget" will also be much cooler with a real async client interface. We will provide that very soon - see SOLR-3383 .
          Hide
          Yonik Seeley added a comment -

          since I offer the "semanticsMode" property in updatehandler

          Seems like this should be up to the client... why wouldn't update semantics be handled on a per-request basis?

          Show
          Yonik Seeley added a comment - since I offer the "semanticsMode" property in updatehandler Seems like this should be up to the client... why wouldn't update semantics be handled on a per-request basis?
          Hide
          Per Steffensen added a comment - - edited

          since I offer the "semanticsMode" property in updatehandler

          Seems like this should be up to the client... why wouldn't update semantics be handled on a per-request basis?

          Maybe we misunderstand each other a little bit. I just agree that people want Solr to provide different kinds of semantics - some want strict version-check, some dont, some will have an fire-and-forget approach, some will not. I just ment to say that for some of those types of semantics you want server-side Solr to behave in a certain way. For that you probably need a configuration telling the server how to behave. I have introduced that for updates with the "semanticsMode" config on updatehandler.

          I believe, when it comes to whether you want strict unique-key-constraint- and version-checks or not, it is probably something you want to configure server-side, so that you can control the possibilities of your clients. Not all Solr-based systems will have complete control over all its clients, and then it is nice on server-side to be able to say (using semanticsMode):

          • consistency: I dont want you as a client to make updates to documents, without even having read the existing document first (and merged your updates into that), and I dont want you to be able to overwrite changes to that document that happend to be made between the time you read it and the time you provided your updated version. And I dont want you to incidentally overwrite an existing document, if your intent was to create a new one.
          • classic: All updates just go right into Solr, but you risk overwriting exsisting stuff
          • classic-concistency-hybrid: You can do both consistency and classic updates, be casefull when doing classic updates
            I believe this kind of control belongs in a server-side config, because it is a way of "protecting" your data against badly written clients.

          With respect to whether you want to do fire-and-forget or not, that should be up to the client, but it kinda is by definition - basically just ignore results. But fire-and-forget strategy on clients is really not worth much, if you have to wait for the server to reply when you do updates (like you have to do today). SOLR-3383 will provide a all-async client SolrJ API (Future-based all the way) so that you can really just "fire", and if you intent is to "forget" anyway, then you can move on immediately after your "fire" without waiting for the server to reply.

          That is not the reason we made SOLR-3383 though. It is because we want to be able to get results back when using ConcurrentSolrServer, and that requires a Future-based request API.

          Regards, Per Steffensen

          Show
          Per Steffensen added a comment - - edited since I offer the "semanticsMode" property in updatehandler Seems like this should be up to the client... why wouldn't update semantics be handled on a per-request basis? Maybe we misunderstand each other a little bit. I just agree that people want Solr to provide different kinds of semantics - some want strict version-check, some dont, some will have an fire-and-forget approach, some will not. I just ment to say that for some of those types of semantics you want server-side Solr to behave in a certain way. For that you probably need a configuration telling the server how to behave. I have introduced that for updates with the "semanticsMode" config on updatehandler. I believe, when it comes to whether you want strict unique-key-constraint- and version-checks or not, it is probably something you want to configure server-side, so that you can control the possibilities of your clients. Not all Solr-based systems will have complete control over all its clients, and then it is nice on server-side to be able to say (using semanticsMode): consistency: I dont want you as a client to make updates to documents, without even having read the existing document first (and merged your updates into that), and I dont want you to be able to overwrite changes to that document that happend to be made between the time you read it and the time you provided your updated version. And I dont want you to incidentally overwrite an existing document, if your intent was to create a new one. classic: All updates just go right into Solr, but you risk overwriting exsisting stuff classic-concistency-hybrid: You can do both consistency and classic updates, be casefull when doing classic updates I believe this kind of control belongs in a server-side config, because it is a way of "protecting" your data against badly written clients. With respect to whether you want to do fire-and-forget or not, that should be up to the client, but it kinda is by definition - basically just ignore results. But fire-and-forget strategy on clients is really not worth much, if you have to wait for the server to reply when you do updates (like you have to do today). SOLR-3383 will provide a all-async client SolrJ API (Future-based all the way) so that you can really just "fire", and if you intent is to "forget" anyway, then you can move on immediately after your "fire" without waiting for the server to reply. That is not the reason we made SOLR-3383 though. It is because we want to be able to get results back when using ConcurrentSolrServer, and that requires a Future-based request API. Regards, Per Steffensen
          Hide
          Per Steffensen added a comment - - edited

          Find attached SOLR-3173_3178_3382_3428_plus.patch

          The patch fits on top of trunk revision 1332666 and is ready for commit.

          Since last patch (SOLR_3173_3178_3382_plus.patch) I have made the following:

          • Implemented test ClassicConsistencyHybridUpdateSemanticsSolrCloudTest verifying that partial errors are propagated correctly in a cloud (ZK) setup, when an update document is not sent directly to the Solr instance running the leader of the slice where the document is to be stored, and the Solr instance therefore has to forward the document to the leader, and when the leader forwards to replica. This made me discover the problems discribed in SOLR-3428, which I had to fix in order to make the test green.
          • Implemented tests (in JsonLoaderTest and CSVRequestHandlerTest) verifying that you can send part references as claimed on http://wiki.apache.org/solr/Per%20Steffensen/Update%20semantics when you are sending JSON or CSV requests
          • Documents for which a partial error occur on leader will not be forwarded to replica.
          • Partial errors are now catched and collected for the entire request in DistributedUpdateProcessor, but the actual version-check etc is still performed in DirectUpdateHandler2 (because that is an per-document-level task)
          • Added Apache License 2.0 text to all new files
          • Added class-level-JavaDoc to all new classes
          • Removed a few unimportant changes

          I will be leaving for an extended weekend now, and will not be available again before wedensday next week. I will probably read and answer comments from time to time until saturday night, though.

          I really really really hope to return to the great news about the patch having been committed. I believe it is now really ready - well worked through features, well covered by tests and all existing plus new tests are green. Great progress, maybe not perfection, but we can shape the last edges later if we find any. Hope you will take the opportunity to commit before too long, now that you have a patch based on a fairly new revision, to avoid having to many conflicts to solve.

          Just to avoid any mistakes - the patch covers SOLR-3173, SOLR-3178, SOLR-3382 and SOLR-3428.

          Regards, Per Steffensen

          Show
          Per Steffensen added a comment - - edited Find attached SOLR-3173 _3178_3382_3428_plus.patch The patch fits on top of trunk revision 1332666 and is ready for commit. Since last patch (SOLR_3173_3178_3382_plus.patch) I have made the following: Implemented test ClassicConsistencyHybridUpdateSemanticsSolrCloudTest verifying that partial errors are propagated correctly in a cloud (ZK) setup, when an update document is not sent directly to the Solr instance running the leader of the slice where the document is to be stored, and the Solr instance therefore has to forward the document to the leader, and when the leader forwards to replica. This made me discover the problems discribed in SOLR-3428 , which I had to fix in order to make the test green. Implemented tests (in JsonLoaderTest and CSVRequestHandlerTest) verifying that you can send part references as claimed on http://wiki.apache.org/solr/Per%20Steffensen/Update%20semantics when you are sending JSON or CSV requests Documents for which a partial error occur on leader will not be forwarded to replica. Partial errors are now catched and collected for the entire request in DistributedUpdateProcessor, but the actual version-check etc is still performed in DirectUpdateHandler2 (because that is an per-document-level task) Added Apache License 2.0 text to all new files Added class-level-JavaDoc to all new classes Removed a few unimportant changes I will be leaving for an extended weekend now, and will not be available again before wedensday next week. I will probably read and answer comments from time to time until saturday night, though. I really really really hope to return to the great news about the patch having been committed. I believe it is now really ready - well worked through features, well covered by tests and all existing plus new tests are green. Great progress, maybe not perfection, but we can shape the last edges later if we find any. Hope you will take the opportunity to commit before too long, now that you have a patch based on a fairly new revision, to avoid having to many conflicts to solve. Just to avoid any mistakes - the patch covers SOLR-3173 , SOLR-3178 , SOLR-3382 and SOLR-3428 . Regards, Per Steffensen
          Hide
          Per Steffensen added a comment - - edited

          Sad to see that you chose to commit a different patch for this one, making me believe that you are not going to take my patch in. I really dont understand why, because I think my solution is at least as good or better in any aspect, and besides that it is more complete, backward compatible and solves a lot of other issues (SOLR-3173, SOLR-3382 and SOLR-3428) already - can I ask you to tell me why you made that choice? Maybe what is wrong with our patch since it is not taken in, what we need to change etc?

          Please also elaborate on your plans wrt the other issues

          • Using unique key constraints to protect against overwriting existing data, when intent is to just insert a new document (kinda what SOLR-3173 was thought to be about) - DocumentAlreadyExists error
          • Even though an document update fails, still try updating the rest of the documents in the same update request, and therefore the feature of being able to report several errors back to the client linking each error to a specific document in the request (basically SOLR-3382)
          • Nicely "typed" errors so that you as a client can detect which kind of error occurred in an easier way than guessing from http status code and textual error messages (also SOLR-3382)
          • The SOLR-3428 bug

          I need to know what to tell my product owners wrt how long we need to run on our own version of Solr (containing our patches etc.) in our high-throughput and high-concurrency pilot-prod system and therefore also for how long we need to keep merging changes from Solr trunk in? And when we can get to contribute additional code - mostly stability/endurance-related bug fixes. We already gave you the fix for SOLR-3437 but some of our additional fixes are kinda dependent on some of the stuff in SOLR-3173_3178_3382_3428_plus.patch. Do you have any plans revealing if we have to wait

          • Forever? (= our patch is not going to be committed)
          • Weeks/Days? (= you are working on understanding our patch and expect to commit later)
          • Or?

          Regards, Per Steffensen

          Show
          Per Steffensen added a comment - - edited Sad to see that you chose to commit a different patch for this one, making me believe that you are not going to take my patch in. I really dont understand why, because I think my solution is at least as good or better in any aspect, and besides that it is more complete, backward compatible and solves a lot of other issues ( SOLR-3173 , SOLR-3382 and SOLR-3428 ) already - can I ask you to tell me why you made that choice? Maybe what is wrong with our patch since it is not taken in, what we need to change etc? Please also elaborate on your plans wrt the other issues Using unique key constraints to protect against overwriting existing data, when intent is to just insert a new document (kinda what SOLR-3173 was thought to be about) - DocumentAlreadyExists error Even though an document update fails, still try updating the rest of the documents in the same update request, and therefore the feature of being able to report several errors back to the client linking each error to a specific document in the request (basically SOLR-3382 ) Nicely "typed" errors so that you as a client can detect which kind of error occurred in an easier way than guessing from http status code and textual error messages (also SOLR-3382 ) The SOLR-3428 bug I need to know what to tell my product owners wrt how long we need to run on our own version of Solr (containing our patches etc.) in our high-throughput and high-concurrency pilot-prod system and therefore also for how long we need to keep merging changes from Solr trunk in? And when we can get to contribute additional code - mostly stability/endurance-related bug fixes. We already gave you the fix for SOLR-3437 but some of our additional fixes are kinda dependent on some of the stuff in SOLR-3173 _3178_3382_3428_plus.patch. Do you have any plans revealing if we have to wait Forever? (= our patch is not going to be committed) Weeks/Days? (= you are working on understanding our patch and expect to commit later) Or? Regards, Per Steffensen
          Hide
          Yonik Seeley added a comment -

          I had missed this last update during the crazy week of the lucene revolution conference.

          It's just a guess, but I think it unlikely any committers would feel comfortable tackling this big patch, or even have time to understand all of the different aspects. They may agree with some parts but disagree with other parts. Small, tightly focused patches that tackle a single aspect are much more likely to get attention.

          Just speaking for myself, when I get time to look at optimistic locking stuff again, I'll probably browse your latest patch to see what improvements I can cherrypick (assuming no one else gets around to making some smaller patches first).

          Show
          Yonik Seeley added a comment - I had missed this last update during the crazy week of the lucene revolution conference. It's just a guess, but I think it unlikely any committers would feel comfortable tackling this big patch, or even have time to understand all of the different aspects. They may agree with some parts but disagree with other parts. Small, tightly focused patches that tackle a single aspect are much more likely to get attention. Just speaking for myself, when I get time to look at optimistic locking stuff again, I'll probably browse your latest patch to see what improvements I can cherrypick (assuming no one else gets around to making some smaller patches first).
          Hide
          Per Steffensen added a comment -

          It's just a guess, but I think it unlikely any committers would feel comfortable tackling this big patch, or even have time to understand all of the different aspects. They may agree with some parts but disagree with other parts. Small, tightly focused patches that tackle a single aspect are much more likely to get attention.

          I understand, but things are very related, so its hard for me to go back and divide into smaller patches and feed them one by one to you. Things are related because "optimistic locking"-errors and "unique key constraints"-errors are not worth much if exact information about what went wrong does not go back to the client (therefore adding the feature of sending typed errors back to the client is needed). And the two kinds on new errros are the first "single document"-related errors in Solr, and therefore it, for the first time in history of Solr, is very nice to have the ability to keep processing documents in a multi-document-update-request even though one document-update fails with one of those errors (therefore multi-error responses with references on each error to the document it relates to is needed). And backward compability is just always nice, and it this case required, IMHO

          Just speaking for myself, when I get time to look at optimistic locking stuff again, I'll probably browse your latest patch to see what improvements I can cherrypick (assuming no one else gets around to making some smaller patches first).

          I would really like that, and I will be available for any help you need. Or just make me a committer, and you will live happily ever after I am very experienced doing this kind of stuff.

          BTW, I saw that you committed some "inc" stuff on this JIRA. As I understand it, it is a new feature where you are able to tell server-side Solr to add to a field, instead of doing it yourself on client side. Server-side Solr can then do it while having the lock on the document, and therefore no optimistic locking is needed in principle - but only if your updates use "inc"s only. I think it deserves its own JIRA, because it doesnt have anything to do with optimistic locking, except that you do not need optimistic locking if your updates use "inc"s only. It is probably not realistic that any real world application would ever be able to do its updates based on "inc"s alone, so in isolation I think the feature is kinda useless. But if you generalize on the feature it could be great. Generalize in a way where you can have any update-semantic performed on your document server-side (while having the document lock), then utilizing this feature heavily would enable you to avoid using optimistic locking and that might be prefered in some situations - just even another reason while backward compability on optimistic locking is nice, so that you can have it disabled (even though you would like to have a full-featured Cloud-setup with updateLog, version field etc.) if you are not going to use it.

          I imagine that you can install document(-field)-manipulating primitives on server-side Solr and associate them with a names (e.g. associate the name "inc" with a piece of code adding to a field, "multiply" with a piece of code multiplying to a field, "string-concat" with a piece of code that concats to fields into a third field, etc) and then have pairs of document-field and document(-field)-manipulating-ref in your updates (e.g. send updates saying "update document A manipulate fields inc(field1,13), multiply(field2,4), string-concat(field3,field4,field5)"). But as long as "inc"s is the only such manipulate-on-server-side-while-having-the-lock-primitive it is kinda useless. The code you associate with primitives ("inc", "multiply", "string-concat" etc) could just be a small piece of Java code, and the association needs to go on in solrconfig.xml. Or if you are really cool (and you trust your clients are not doing SQL-injection-ish hacks on you) you can allow clients to install primitives on-demand, either as part of update-requests that use those primitives or as separate primitive-registration-request that you do before you start using a primitive. Such request-carried primitives could be in the form of Groovy code, Ruby code, JavaScript code, or whatever the JVM is able to compile and use on-demand.

          Regards, Per Steffensen

          Show
          Per Steffensen added a comment - It's just a guess, but I think it unlikely any committers would feel comfortable tackling this big patch, or even have time to understand all of the different aspects. They may agree with some parts but disagree with other parts. Small, tightly focused patches that tackle a single aspect are much more likely to get attention. I understand, but things are very related, so its hard for me to go back and divide into smaller patches and feed them one by one to you. Things are related because "optimistic locking"-errors and "unique key constraints"-errors are not worth much if exact information about what went wrong does not go back to the client (therefore adding the feature of sending typed errors back to the client is needed). And the two kinds on new errros are the first "single document"-related errors in Solr, and therefore it, for the first time in history of Solr, is very nice to have the ability to keep processing documents in a multi-document-update-request even though one document-update fails with one of those errors (therefore multi-error responses with references on each error to the document it relates to is needed). And backward compability is just always nice, and it this case required, IMHO Just speaking for myself, when I get time to look at optimistic locking stuff again, I'll probably browse your latest patch to see what improvements I can cherrypick (assuming no one else gets around to making some smaller patches first). I would really like that, and I will be available for any help you need. Or just make me a committer, and you will live happily ever after I am very experienced doing this kind of stuff. BTW, I saw that you committed some "inc" stuff on this JIRA. As I understand it, it is a new feature where you are able to tell server-side Solr to add to a field, instead of doing it yourself on client side. Server-side Solr can then do it while having the lock on the document, and therefore no optimistic locking is needed in principle - but only if your updates use "inc"s only. I think it deserves its own JIRA, because it doesnt have anything to do with optimistic locking, except that you do not need optimistic locking if your updates use "inc"s only. It is probably not realistic that any real world application would ever be able to do its updates based on "inc"s alone, so in isolation I think the feature is kinda useless. But if you generalize on the feature it could be great. Generalize in a way where you can have any update-semantic performed on your document server-side (while having the document lock), then utilizing this feature heavily would enable you to avoid using optimistic locking and that might be prefered in some situations - just even another reason while backward compability on optimistic locking is nice, so that you can have it disabled (even though you would like to have a full-featured Cloud-setup with updateLog, version field etc.) if you are not going to use it. I imagine that you can install document(-field)-manipulating primitives on server-side Solr and associate them with a names (e.g. associate the name "inc" with a piece of code adding to a field, "multiply" with a piece of code multiplying to a field, "string-concat" with a piece of code that concats to fields into a third field, etc) and then have pairs of document-field and document(-field)-manipulating-ref in your updates (e.g. send updates saying "update document A manipulate fields inc(field1,13), multiply(field2,4), string-concat(field3,field4,field5)"). But as long as "inc"s is the only such manipulate-on-server-side-while-having-the-lock-primitive it is kinda useless. The code you associate with primitives ("inc", "multiply", "string-concat" etc) could just be a small piece of Java code, and the association needs to go on in solrconfig.xml. Or if you are really cool (and you trust your clients are not doing SQL-injection-ish hacks on you) you can allow clients to install primitives on-demand, either as part of update-requests that use those primitives or as separate primitive-registration-request that you do before you start using a primitive. Such request-carried primitives could be in the form of Groovy code, Ruby code, JavaScript code, or whatever the JVM is able to compile and use on-demand. Regards, Per Steffensen
          Hide
          Yonik Seeley added a comment -

          Regarding error handling, I tracked down the original issue: SOLR-445

          BTW, I saw that you committed some "inc" stuff on this JIRA.

          An accident... I was traveling and got the JIRA number wrong on the commit. The JIRA for updateable documents (inc/add/set, etc) is
          SOLR-139

          Show
          Yonik Seeley added a comment - Regarding error handling, I tracked down the original issue: SOLR-445 BTW, I saw that you committed some "inc" stuff on this JIRA. An accident... I was traveling and got the JIRA number wrong on the commit. The JIRA for updateable documents (inc/add/set, etc) is SOLR-139
          Hide
          Per Steffensen added a comment - - edited

          Regarding error handling, I tracked down the original issue: SOLR-445

          Yes, SOLR-445 is solved by my patch - the nice way. On certain kinds of errors (PartialError subclasses) during the handling of a particular document in a nultidocument/batch-update the processing of subsequent documents will continue. The client will receive a response describing all errors (wrapped in PartialErrors) that happend during the processing of the entire update-request (multidocument/batch). Please have a look at http://wiki.apache.org/solr/Per%20Steffensen/Update%20semantics#Multi_document_updates

          It's just a guess, but I think it unlikely any committers would feel comfortable tackling this big patch, or even have time to understand all of the different aspects. They may agree with some parts but disagree with other parts

          Of course that is up to you, but I believe Solr has a problem being a real Open Source project receiving contributions from many semi-related organistions around the world, if you do not "trust your test suite". Basically when taking in a patch a committer do not need to understand everything down to every detail. It should be enough (if you "trust your test suite") to

          • Verify that all existing tests are still green - and havnt been hacked
          • Verify that all new tests seem to be meaningfull and covering the features described in the corresponding Jira (and in my case the associated Wiki page), indicating that the new features are usefull and well tested (in order to be able to "trust the test suite" will reveal if future commits will ruin this new feature)
          • Scan through the new code to see if it is breaking any design principals etc., and in general if it seems to be doing the right thing the right way

          As long as a patch does not break any existing functionality, and seems to bring nice new functionality (you should be able to see that from the added tests) a patch cannot be that harmfull - you can always refactor if you realize that you "disagree with some parts". It all depends on "trusting your test suite". Dont you agree, in principle at least?

          Regards, Per Steffensen

          Show
          Per Steffensen added a comment - - edited Regarding error handling, I tracked down the original issue: SOLR-445 Yes, SOLR-445 is solved by my patch - the nice way. On certain kinds of errors (PartialError subclasses) during the handling of a particular document in a nultidocument/batch-update the processing of subsequent documents will continue. The client will receive a response describing all errors (wrapped in PartialErrors) that happend during the processing of the entire update-request (multidocument/batch). Please have a look at http://wiki.apache.org/solr/Per%20Steffensen/Update%20semantics#Multi_document_updates It's just a guess, but I think it unlikely any committers would feel comfortable tackling this big patch, or even have time to understand all of the different aspects. They may agree with some parts but disagree with other parts Of course that is up to you, but I believe Solr has a problem being a real Open Source project receiving contributions from many semi-related organistions around the world, if you do not "trust your test suite". Basically when taking in a patch a committer do not need to understand everything down to every detail. It should be enough (if you "trust your test suite") to Verify that all existing tests are still green - and havnt been hacked Verify that all new tests seem to be meaningfull and covering the features described in the corresponding Jira (and in my case the associated Wiki page), indicating that the new features are usefull and well tested (in order to be able to "trust the test suite" will reveal if future commits will ruin this new feature) Scan through the new code to see if it is breaking any design principals etc., and in general if it seems to be doing the right thing the right way As long as a patch does not break any existing functionality, and seems to bring nice new functionality (you should be able to see that from the added tests) a patch cannot be that harmfull - you can always refactor if you realize that you "disagree with some parts". It all depends on "trusting your test suite". Dont you agree, in principle at least? Regards, Per Steffensen
          Hide
          Hoss Man added a comment -

          bulk fixing the version info for 4.0-ALPHA and 4.0 all affected issues have "hoss20120711-bulk-40-change" in comment

          Show
          Hoss Man added a comment - bulk fixing the version info for 4.0-ALPHA and 4.0 all affected issues have "hoss20120711-bulk-40-change" in comment
          Hide
          Per Steffensen added a comment - - edited

          SOLR-3173_3178_3382_3428_plus.patch updated to fit on top of revision 1355667 on 4.x branch.

          I have tried to describe for (almost) all files in the patch, which problem the changes in them solve. Please see below. A lot of files has changed, but it is very small changes in most of the files basically just making sure SolrRequestInfo-threadlocal is set in tests and other stuff like that. The solutions to SOLR-3173 and SOLR-3178 are not very big. Actually much more code has been done to solve SOLR-3382. With those descriptions I hope to convince you to do the commit. Trust you test-suite - it has not been changed, except that additional asserts has been added.

          I am really not religious about whether the version-control code is in DistributedUpdateProcessor or DirectUpdateHandler2, but I am religious about being backward compatible wrt such fundamental changes in semantics, and I am religious about getting typed errors/exceptions back in responses, so that I can react properly. I do believe version-control belong in DirectUpdateHandler2, but if you want it in DistributedUpdateProcessor, basically just refactor (after commit) and move the changes in DirectUpdateHandler2 to DistributedUpdateHandler.

          • Solved SOLR-3173 and SOLR-3178 (implementation of fail-on-unique-key-violation and version-check), by introducing 3 "modes" Solr server can run in (controlled by "semanticsMode" in solrconfig.xml | updateHandler) - one for backward compatibility, one for a very strict version control, and a hybrid (default).
            • solr/core/src/java/org/apache/solr/update/DirectUpdateHandler2.java (the new features are controlled from here, but basically using the rules in UpdateSemanticsMode. Got rid of duplicate code in constructors - grrrr)
            • solr/core/src/java/org/apache/solr/update/UpdateSemanticsMode.java (enum that encapsulates the essence in the 3 possible "semanticsModes". Used by DirectUpdateHandler2. Nice "separation of concern")
            • solr/core/src/java/org/apache/solr/update/UpdateCommand.java (added leaderLogic calculated (as isLeader) in DistributedUpdateProcesser to UpdateCommand so that it is available in DirectUpdateHandler2, where it is used to know if the new checks must be performed or not. UpdateCommand also carrying requestVersion, the version sent in the request)
            • solr/core/src/java/org/apache/solr/core/SolrConfig.java (control via configuration which semantics-mode to use)
            • solr/core/src/java/org/apache/solr/update/processor/DistributedUpdateProcessor.java (maintains leaderLogic on UpdateCommand)
            • solr/core/src/java/org/apache/solr/update/SolrCmdDistributor.java (maintains requestVersion on UpdateCommand)
            • solr/core/src/java/org/apache/solr/update/UpdateLog.java (maintains requestVersion on UpdateCommand)
            • solr/core/src/java/org/apache/solr/update/PeerSync.java (maintains leaderLogic and requestVersion on UpdateCommand)
            • solr/core/src/test-files/solr/collection1/conf/solrconfig-classic-semantics.xml (default test solrconfig.xml but with "classic" semantics-mode, used to test that the new features fail-on-unique-key-violation and version-check are not activated when running "classic" semantics - the backward compatible "mode")
            • solr/core/src/test-files/solr/collection1/conf/solrconfig-classic-consistency-hybrid-semantics.xml (default test solrconfig.xml but with "classic-consistency-hybrid" semantics-mode. Actually default solrconfig.xml could have been used since "classic-consistency-hybrid" semantics is default, but used to test that it (also) works when this mode is selected explicitly)
            • solr/test-framework/src/java/org/apache/solr/SolrTestCaseJ4.java and solr/test-framework/src/java/org/apache/solr/util/AbstractSolrTestCase.java and solr/test-framework/src/java/org/apache/solr/util/TestHarness.java (basically just a lot of changes making testing easier)
          • Removed the (temporary and poor IMHO) implementation of fail-on-unique-key-violation and version-check, but kept the introduced tests, so you can see that the same functionality is still provided (on default semantics "classic-consistency-hybrid")
            • solr/core/src/java/org/apache/solr/update/processor/DistributedUpdateProcessor.java
            • solr/core/src/test/org/apache/solr/search/TestRealTimeGet.java (now we do not only get exceptions in "bad" situations, we get typed exceptions telling us what is wrong)
          • Solved SOLR-3382 - partRef and PartialError(s) introduced in multi-document-update-requests so that "request parts" (essentially documents sent for update) can be paired up with errors in responses, so that the client will know that document-updates failed with what errors and therefore also which documents where updated successfully
            • solr/solrj/src/java/org/apache/solr/common/RequestPart.java (new generic interface to be implemented by all parts of a request that is "just a part", and where an error that occures during the handling of this particular part does not mean that handling other parts will also result in error. Just to make it a little generic from the start, but by now only SolrInputDocuments in update-requests are such "request-parts")
            • solr/solrj/src/java/org/apache/solr/common/RequestPartImpl.java (new default impl of RequestPart, to be used by classes that really want to be a "request-part")
            • solr/solrj/src/java/org/apache/solr/common/SolrInputDocument.java (a "request part" implemented using RequestPartImpl)
            • solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpSolrServer.java (dealing with errors returned in responses. removed a poor initial but never finished attempt to transport errors in responses, and replaced with my way of doing it (complete and finished solution). this indicates a problem with the approach of only accepting small patches - you very easily end up having partly implemented features that no-one ever finishes)
            • solr/solrj/src/java/org/apache/solr/client/solrj/impl/ConcurrentUpdateSolrServer.java (dealing with errors returned in responses)
            • solr/solrj/src/java/org/apache/solr/common/partialerrors/WrongUsage.java (new error type indicating that the client is using the system in a wrong way - basically that a request is inconsistent with server rules about how the request must be)
            • solr/solrj/src/java/org/apache/solr/common/partialerrors/PartialError.java (super-class of all partial-error-types)
            • solr/solrj/src/java/org/apache/solr/common/partialerrors/update/DocumentUpdatePartialError.java (super-class of all partial-error-types that has to do with updates)
            • solr/solrj/src/java/org/apache/solr/common/partialerrors/update/DocumentAlreadyExists.java (new error type indicating that the document you tried to create (not update) already exists)
            • solr/solrj/src/java/org/apache/solr/common/partialerrors/update/DocumentDoesNotExist.java (new error type indicating that the document you tried to update (not create) does not already exist (maybe it has been deleted))
            • solr/solrj/src/java/org/apache/solr/common/partialerrors/update/VersionConflict.java (new error type indicating that there is a version-mismatch in the document that the client tries to update. Basically the client loaded a version of the document, made some updates to it and sent it to the server for storage of the changes, but the document on the server has changed since the document was loaded by the client, and the client has therefore created his updated document based on a obsolete version of the document. Nothing to do but reload, change and attempt update again)
            • solr/solrj/src/java/org/apache/solr/common/partialerrors/PartialErrors.java (new error type able to carry a map between "request parts" and the errors that occurred when they where handled)
            • solr/core/src/java/org/apache/solr/update/processor/DistributedUpdateProcessor.java (catching PartialErrors (defined as errors only related to the document currently being handled) and continues with next document after adding it to response linked to document that triggered the error)
            • solr/core/src/java/org/apache/solr/servlet/SolrDispatchFilter.java (encodes errors. If multiple partial-errors occurred (e.g. multi-document-updates) a body is still needed in the response, telling about the map between errors and the documents they occurred for. If only single error occurred still just report that back using HTTP-error-code and reason-phrase. Also set HTTP header including the error-type (namespace + name = java-package + java-class-name). The actual encoding/decoding of errors in responses is isolated in SolrException)
            • solr/solrj/src/java/org/apache/solr/common/SolrException.java (encapsulating the logic to encode/decode errors in responses - also visible for SolrJ clients so we can decode there in a nice way)
            • solr/core/src/java/org/apache/solr/response/SolrQueryResponse.java (able to carry information about partial-errors and handled "parts" of a request)
            • solr/core/src/java/org/apache/solr/handler/ContentStreamLoader.java and solr/core/src/java/org/apache/solr/handler/loader/ContentStreamLoader.java (a few constants - strings used to identify partRefs in requests - why two classes!?!)
            • solr/core/src/java/org/apache/solr/handler/loader/JsonLoader.java (partRef can be sent in JSON requests)
            • solr/core/src/java/org/apache/solr/handler/loader/CSVLoaderBase.java (partRefs can be sent in CSV requests)
            • solr/core/src/java/org/apache/solr/handler/loader/XMLLoader.java (partRefs can be sent in XML requests)
            • solr/solrj/src/java/org/apache/solr/client/solrj/util/ClientUtils.java (partRefs can be sent in XML requests)
            • solr/solrj/src/java/org/apache/solr/common/util/JavaBinCodec.java (partRefs can be sent in binary requests)
            • solr/solrj/src/java/org/apache/solr/client/solrj/SolrResponse.java (gives access to partial-errors, map between "request parts" (essentially documents) and corresponding partial-errors, number of partial-errors etc.)
            • solr/solrj/src/java/org/apache/solr/client/solrj/response/UpdateResponse.java (see SolrResponse.java)
            • solr/test-framework/src/java/org/apache/solr/SolrTestCaseJ4.java and solr/test-framework/src/java/org/apache/solr/util/AbstractSolrTestCase.java and solr/test-framework/src/java/org/apache/solr/util/TestHarness.java (return errors in helper-assertFailed-methods, so that those errors can be inspected in further more specific assertions - basically just a lot of changes making testing easier)
            • solr/solrj/src/test/org/apache/solr/client/solrj/SolrExampleTests.java (nicer asserts including asserting on correct type (class) of exception)
            • solr/core/src/test/org/apache/solr/handler/JsonLoaderTest.java (test of it when using JSON requests and responses)
            • solr/core/src/test/org/apache/solr/handler/CSVRequestHandlerTest.java
          • Solved SOLR-3428
            • solr/core/src/java/org/apache/solr/update/SolrCmdDistributor.java
          • In DirectUpdateHandler2 (solving SOLR-3173 and SOLR-3178) and other places Im working with the SolrQueryRequest and SolrQueryResponse found in the threadlocal of SolrRequestInfo, so therefore this threadlocal needs to be set (and cleared) in many tests
            • solr/core/src/test/org/apache/solr/update/processor/FieldMutatingUpdateProcessorTest.java
            • solr/core/src/test/org/apache/solr/update/processor/SignatureUpdateProcessorFactoryTest.java
            • solr/core/src/test/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactoryTest.java
            • solr/core/src/test/org/apache/solr/update/TestIndexingPerformance.java
            • solr/core/src/test/org/apache/solr/request/TestBinaryResponseWriter.java
            • solr/core/src/test/org/apache/solr/request/TestFaceting.java
            • solr/core/src/test/org/apache/solr/core/TestArbitraryIndexDir.java
            • solr/core/src/test/org/apache/solr/TestGroupingSearch.java
            • solr/core/src/test/org/apache/solr/SampleTest.java
            • solr/core/src/test/org/apache/solr/EchoParamsTest.java
            • solr/core/src/test/org/apache/solr/search/function/TestFunctionQuery.java
            • solr/core/src/test/org/apache/solr/search/TestValueSourceCache.java
            • solr/core/src/test/org/apache/solr/search/TestSearchPerf.java
            • solr/core/src/test/org/apache/solr/search/QueryEqualityTest.java
            • solr/core/src/test/org/apache/solr/search/TestSort.java
            • solr/core/src/test/org/apache/solr/cloud/BasicZkTest.java
            • solr/core/src/test/org/apache/solr/highlight/FastVectorHighlighterTest.java
            • solr/core/src/test/org/apache/solr/highlight/HighlighterConfigTest.java
            • solr/core/src/test/org/apache/solr/highlight/HighlighterTest.java
            • solr/core/src/test/org/apache/solr/BasicFunctionalityTest.java
            • solr/core/src/test/org/apache/solr/DisMaxRequestHandlerTest.java
            • solr/core/src/test/org/apache/solr/handler/XsltUpdateRequestHandlerTest.java
            • solr/core/src/test/org/apache/solr/handler/StandardRequestHandlerTest.java
            • solr/core/src/test/org/apache/solr/handler/component/StatsComponentTest.java
            • solr/core/src/test/org/apache/solr/handler/MoreLikeThisHandlerTest.java
            • solr/core/src/java/org/apache/solr/servlet/DirectSolrConnection.java
            • solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/AbstractDataImportHandlerTestCase.java
            • solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/TestDocBuilder2.java
            • solr/contrib/uima/src/test/org/apache/solr/uima/processor/UIMAUpdateRequestProcessorTest.java
            • solr/test-framework/src/java/org/apache/solr/SolrTestCaseJ4.java
            • solr/test-framework/src/java/org/apache/solr/util/AbstractSolrTestCase.java
            • solr/test-framework/src/java/org/apache/solr/util/TestHarness.java
          • SolrRequest sub-classes has a lot of duplicated code (grrrr). Cleaned up and put shared code in one place - SolrRequest
            • solr/solrj/src/java/org/apache/solr/client/solrj/SolrRequest.java
            • solr/solrj/src/java/org/apache/solr/client/solrj/request/AbstractUpdateRequest.java
            • solr/solrj/src/java/org/apache/solr/client/solrj/request/FieldAnalysisRequest.java
            • solr/solrj/src/java/org/apache/solr/client/solrj/request/DirectXmlRequest.java
            • solr/solrj/src/java/org/apache/solr/client/solrj/request/LukeRequest.java
            • solr/solrj/src/java/org/apache/solr/client/solrj/request/CoreAdminRequest.java
            • solr/solrj/src/java/org/apache/solr/client/solrj/request/DocumentAnalysisRequest.java
            • solr/solrj/src/java/org/apache/solr/client/solrj/request/SolrPing.java
            • solr/solrj/src/java/org/apache/solr/client/solrj/request/QueryRequest.java
          • Think we should work with a common (for both server and client side) "version" String constant, but VersionInfo is not available for client side, so I removed VERSION_FIELD = "version" from VersionInfo to SolrInputDocument in order to make it available for both server and client (SolrJ) side.
            • solr/core/src/java/org/apache/solr/update/VersionInfo.java
            • solr/solrj/src/java/org/apache/solr/common/SolrInputDocument.java
            • solr/core/src/java/org/apache/solr/update/DirectUpdateHandler2.java
            • solr/core/src/java/org/apache/solr/update/processor/DistributedUpdateProcessor.java
            • solr/core/src/test/org/apache/solr/update/DirectUpdateHandlerTest.java
            • solr/core/src/test/org/apache/solr/cloud/FullSolrCloudDistribCmdsTest.java
          • Added some new tests with focus on testing the new version- and unique-key-based features - ability to fail instead of overwriting on unique-key violation and ability to report errors on version conflict
            • solr/core/src/test/org/apache/solr/update/ClassicConsistencyHybridUpdateSemanticsTest.java
            • solr/core/src/test/org/apache/solr/update/ClassicConsistencyHybridUpdateSemanticsPartialErrorsTest.java
            • solr/core/src/test/org/apache/solr/update/ClassicUpdateSemanticsTest.java
            • solr/core/src/test/org/apache/solr/cloud/ClassicConsistencyHybridUpdateSemanticsSolrCloudTest.java
            • solr/solrj/src/test/org/apache/solr/client/update/ClassicConsistencyHybridUpdateSemanticsTest.java
            • solr/solrj/src/test/org/apache/solr/client/update/ClassicConsistencyHybridUpdateSemanticsConcurrencyTest.java
          • Renamed AddUpdateCommand.overwrite to AddUpdateCommand.classicOverwrite, because this is a field that is now only used when running with "classic" semantics
            • solr/core/src/java/org/apache/solr/update/DirectUpdateHandler2.java
            • solr/core/src/java/org/apache/solr/update/SolrCmdDistributor.java
            • solr/core/src/java/org/apache/solr/handler/loader/JsonLoader.java
            • solr/core/src/java/org/apache/solr/handler/loader/JavabinLoader.java
            • solr/core/src/java/org/apache/solr/handler/loader/CSVLoaderBase.java
            • solr/core/src/java/org/apache/solr/handler/loader/XMLLoader.java
            • solr/contrib/extraction/src/java/org/apache/solr/handler/extraction/ExtractingDocumentLoader.java
            • solr/solrj/src/java/org/apache/solr/client/solrj/request/UpdateRequestExt.java
            • solr/core/src/java/org/apache/solr/update/AddUpdateCommand.java
            • solr/core/src/test/org/apache/solr/update/TestIndexingPerformance.java
            • solr/core/src/test/org/apache/solr/handler/XmlUpdateRequestHandlerTest.java
            • solr/core/src/test/org/apache/solr/handler/BinaryUpdateRequestHandlerTest.java
            • etc
          • When putting config-files into ZK you can have them named something different in there than their file-name on disk. E.g. ./my/folder/my-solrconfig.xml can be named solrconfig.xml in ZK
            • solr/core/src/test/org/apache/solr/cloud/AbstractZkTestCase.java
            • solr/core/src/test/org/apache/solr/cloud/AbstractDistributedZkTestCase.java
          • Misc unimportant cleanup - removed unneeded imports, etc.
            • solr/core/src/test/org/apache/solr/update/DirectUpdateHandlerOptimizeTest.java
            • solr/core/src/test/org/apache/solr/request/TestWriterPerf.java (just a nicer way to look and afterwards use rsp.getException())
            • solr/core/src/java/org/apache/solr/servlet/DirectSolrConnection.java (just a nicer way to look and afterwards use rsp.getException())
            • solr/core/src/java/org/apache/solr/client/solrj/embedded/EmbeddedSolrServer.java (just a nicer way to look and afterwards use rsp.getException())
            • solr/solrj/src/java/org/apache/solr/client/solrj/impl/LBHttpSolrServer.java
          Show
          Per Steffensen added a comment - - edited SOLR-3173 _3178_3382_3428_plus.patch updated to fit on top of revision 1355667 on 4.x branch. I have tried to describe for (almost) all files in the patch, which problem the changes in them solve. Please see below. A lot of files has changed, but it is very small changes in most of the files basically just making sure SolrRequestInfo-threadlocal is set in tests and other stuff like that. The solutions to SOLR-3173 and SOLR-3178 are not very big. Actually much more code has been done to solve SOLR-3382 . With those descriptions I hope to convince you to do the commit. Trust you test-suite - it has not been changed, except that additional asserts has been added. I am really not religious about whether the version-control code is in DistributedUpdateProcessor or DirectUpdateHandler2, but I am religious about being backward compatible wrt such fundamental changes in semantics, and I am religious about getting typed errors/exceptions back in responses, so that I can react properly. I do believe version-control belong in DirectUpdateHandler2, but if you want it in DistributedUpdateProcessor, basically just refactor (after commit) and move the changes in DirectUpdateHandler2 to DistributedUpdateHandler. Solved SOLR-3173 and SOLR-3178 (implementation of fail-on-unique-key-violation and version-check), by introducing 3 "modes" Solr server can run in (controlled by "semanticsMode" in solrconfig.xml | updateHandler) - one for backward compatibility, one for a very strict version control, and a hybrid (default). solr/core/src/java/org/apache/solr/update/DirectUpdateHandler2.java (the new features are controlled from here, but basically using the rules in UpdateSemanticsMode. Got rid of duplicate code in constructors - grrrr) solr/core/src/java/org/apache/solr/update/UpdateSemanticsMode.java (enum that encapsulates the essence in the 3 possible "semanticsModes". Used by DirectUpdateHandler2. Nice "separation of concern") solr/core/src/java/org/apache/solr/update/UpdateCommand.java (added leaderLogic calculated (as isLeader) in DistributedUpdateProcesser to UpdateCommand so that it is available in DirectUpdateHandler2, where it is used to know if the new checks must be performed or not. UpdateCommand also carrying requestVersion, the version sent in the request) solr/core/src/java/org/apache/solr/core/SolrConfig.java (control via configuration which semantics-mode to use) solr/core/src/java/org/apache/solr/update/processor/DistributedUpdateProcessor.java (maintains leaderLogic on UpdateCommand) solr/core/src/java/org/apache/solr/update/SolrCmdDistributor.java (maintains requestVersion on UpdateCommand) solr/core/src/java/org/apache/solr/update/UpdateLog.java (maintains requestVersion on UpdateCommand) solr/core/src/java/org/apache/solr/update/PeerSync.java (maintains leaderLogic and requestVersion on UpdateCommand) solr/core/src/test-files/solr/collection1/conf/solrconfig-classic-semantics.xml (default test solrconfig.xml but with "classic" semantics-mode, used to test that the new features fail-on-unique-key-violation and version-check are not activated when running "classic" semantics - the backward compatible "mode") solr/core/src/test-files/solr/collection1/conf/solrconfig-classic-consistency-hybrid-semantics.xml (default test solrconfig.xml but with "classic-consistency-hybrid" semantics-mode. Actually default solrconfig.xml could have been used since "classic-consistency-hybrid" semantics is default, but used to test that it (also) works when this mode is selected explicitly) solr/test-framework/src/java/org/apache/solr/SolrTestCaseJ4.java and solr/test-framework/src/java/org/apache/solr/util/AbstractSolrTestCase.java and solr/test-framework/src/java/org/apache/solr/util/TestHarness.java (basically just a lot of changes making testing easier) Removed the (temporary and poor IMHO) implementation of fail-on-unique-key-violation and version-check, but kept the introduced tests, so you can see that the same functionality is still provided (on default semantics "classic-consistency-hybrid") solr/core/src/java/org/apache/solr/update/processor/DistributedUpdateProcessor.java solr/core/src/test/org/apache/solr/search/TestRealTimeGet.java (now we do not only get exceptions in "bad" situations, we get typed exceptions telling us what is wrong) Solved SOLR-3382 - partRef and PartialError(s) introduced in multi-document-update-requests so that "request parts" (essentially documents sent for update) can be paired up with errors in responses, so that the client will know that document-updates failed with what errors and therefore also which documents where updated successfully solr/solrj/src/java/org/apache/solr/common/RequestPart.java (new generic interface to be implemented by all parts of a request that is "just a part", and where an error that occures during the handling of this particular part does not mean that handling other parts will also result in error. Just to make it a little generic from the start, but by now only SolrInputDocuments in update-requests are such "request-parts") solr/solrj/src/java/org/apache/solr/common/RequestPartImpl.java (new default impl of RequestPart, to be used by classes that really want to be a "request-part") solr/solrj/src/java/org/apache/solr/common/SolrInputDocument.java (a "request part" implemented using RequestPartImpl) solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpSolrServer.java (dealing with errors returned in responses. removed a poor initial but never finished attempt to transport errors in responses, and replaced with my way of doing it (complete and finished solution). this indicates a problem with the approach of only accepting small patches - you very easily end up having partly implemented features that no-one ever finishes) solr/solrj/src/java/org/apache/solr/client/solrj/impl/ConcurrentUpdateSolrServer.java (dealing with errors returned in responses) solr/solrj/src/java/org/apache/solr/common/partialerrors/WrongUsage.java (new error type indicating that the client is using the system in a wrong way - basically that a request is inconsistent with server rules about how the request must be) solr/solrj/src/java/org/apache/solr/common/partialerrors/PartialError.java (super-class of all partial-error-types) solr/solrj/src/java/org/apache/solr/common/partialerrors/update/DocumentUpdatePartialError.java (super-class of all partial-error-types that has to do with updates) solr/solrj/src/java/org/apache/solr/common/partialerrors/update/DocumentAlreadyExists.java (new error type indicating that the document you tried to create (not update) already exists) solr/solrj/src/java/org/apache/solr/common/partialerrors/update/DocumentDoesNotExist.java (new error type indicating that the document you tried to update (not create) does not already exist (maybe it has been deleted)) solr/solrj/src/java/org/apache/solr/common/partialerrors/update/VersionConflict.java (new error type indicating that there is a version-mismatch in the document that the client tries to update. Basically the client loaded a version of the document, made some updates to it and sent it to the server for storage of the changes, but the document on the server has changed since the document was loaded by the client, and the client has therefore created his updated document based on a obsolete version of the document. Nothing to do but reload, change and attempt update again) solr/solrj/src/java/org/apache/solr/common/partialerrors/PartialErrors.java (new error type able to carry a map between "request parts" and the errors that occurred when they where handled) solr/core/src/java/org/apache/solr/update/processor/DistributedUpdateProcessor.java (catching PartialErrors (defined as errors only related to the document currently being handled) and continues with next document after adding it to response linked to document that triggered the error) solr/core/src/java/org/apache/solr/servlet/SolrDispatchFilter.java (encodes errors. If multiple partial-errors occurred (e.g. multi-document-updates) a body is still needed in the response, telling about the map between errors and the documents they occurred for. If only single error occurred still just report that back using HTTP-error-code and reason-phrase. Also set HTTP header including the error-type (namespace + name = java-package + java-class-name). The actual encoding/decoding of errors in responses is isolated in SolrException) solr/solrj/src/java/org/apache/solr/common/SolrException.java (encapsulating the logic to encode/decode errors in responses - also visible for SolrJ clients so we can decode there in a nice way) solr/core/src/java/org/apache/solr/response/SolrQueryResponse.java (able to carry information about partial-errors and handled "parts" of a request) solr/core/src/java/org/apache/solr/handler/ContentStreamLoader.java and solr/core/src/java/org/apache/solr/handler/loader/ContentStreamLoader.java (a few constants - strings used to identify partRefs in requests - why two classes!?!) solr/core/src/java/org/apache/solr/handler/loader/JsonLoader.java (partRef can be sent in JSON requests) solr/core/src/java/org/apache/solr/handler/loader/CSVLoaderBase.java (partRefs can be sent in CSV requests) solr/core/src/java/org/apache/solr/handler/loader/XMLLoader.java (partRefs can be sent in XML requests) solr/solrj/src/java/org/apache/solr/client/solrj/util/ClientUtils.java (partRefs can be sent in XML requests) solr/solrj/src/java/org/apache/solr/common/util/JavaBinCodec.java (partRefs can be sent in binary requests) solr/solrj/src/java/org/apache/solr/client/solrj/SolrResponse.java (gives access to partial-errors, map between "request parts" (essentially documents) and corresponding partial-errors, number of partial-errors etc.) solr/solrj/src/java/org/apache/solr/client/solrj/response/UpdateResponse.java (see SolrResponse.java) solr/test-framework/src/java/org/apache/solr/SolrTestCaseJ4.java and solr/test-framework/src/java/org/apache/solr/util/AbstractSolrTestCase.java and solr/test-framework/src/java/org/apache/solr/util/TestHarness.java (return errors in helper-assertFailed-methods, so that those errors can be inspected in further more specific assertions - basically just a lot of changes making testing easier) solr/solrj/src/test/org/apache/solr/client/solrj/SolrExampleTests.java (nicer asserts including asserting on correct type (class) of exception) solr/core/src/test/org/apache/solr/handler/JsonLoaderTest.java (test of it when using JSON requests and responses) solr/core/src/test/org/apache/solr/handler/CSVRequestHandlerTest.java Solved SOLR-3428 solr/core/src/java/org/apache/solr/update/SolrCmdDistributor.java In DirectUpdateHandler2 (solving SOLR-3173 and SOLR-3178 ) and other places Im working with the SolrQueryRequest and SolrQueryResponse found in the threadlocal of SolrRequestInfo, so therefore this threadlocal needs to be set (and cleared) in many tests solr/core/src/test/org/apache/solr/update/processor/FieldMutatingUpdateProcessorTest.java solr/core/src/test/org/apache/solr/update/processor/SignatureUpdateProcessorFactoryTest.java solr/core/src/test/org/apache/solr/update/processor/UniqFieldsUpdateProcessorFactoryTest.java solr/core/src/test/org/apache/solr/update/TestIndexingPerformance.java solr/core/src/test/org/apache/solr/request/TestBinaryResponseWriter.java solr/core/src/test/org/apache/solr/request/TestFaceting.java solr/core/src/test/org/apache/solr/core/TestArbitraryIndexDir.java solr/core/src/test/org/apache/solr/TestGroupingSearch.java solr/core/src/test/org/apache/solr/SampleTest.java solr/core/src/test/org/apache/solr/EchoParamsTest.java solr/core/src/test/org/apache/solr/search/function/TestFunctionQuery.java solr/core/src/test/org/apache/solr/search/TestValueSourceCache.java solr/core/src/test/org/apache/solr/search/TestSearchPerf.java solr/core/src/test/org/apache/solr/search/QueryEqualityTest.java solr/core/src/test/org/apache/solr/search/TestSort.java solr/core/src/test/org/apache/solr/cloud/BasicZkTest.java solr/core/src/test/org/apache/solr/highlight/FastVectorHighlighterTest.java solr/core/src/test/org/apache/solr/highlight/HighlighterConfigTest.java solr/core/src/test/org/apache/solr/highlight/HighlighterTest.java solr/core/src/test/org/apache/solr/BasicFunctionalityTest.java solr/core/src/test/org/apache/solr/DisMaxRequestHandlerTest.java solr/core/src/test/org/apache/solr/handler/XsltUpdateRequestHandlerTest.java solr/core/src/test/org/apache/solr/handler/StandardRequestHandlerTest.java solr/core/src/test/org/apache/solr/handler/component/StatsComponentTest.java solr/core/src/test/org/apache/solr/handler/MoreLikeThisHandlerTest.java solr/core/src/java/org/apache/solr/servlet/DirectSolrConnection.java solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/AbstractDataImportHandlerTestCase.java solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/TestDocBuilder2.java solr/contrib/uima/src/test/org/apache/solr/uima/processor/UIMAUpdateRequestProcessorTest.java solr/test-framework/src/java/org/apache/solr/SolrTestCaseJ4.java solr/test-framework/src/java/org/apache/solr/util/AbstractSolrTestCase.java solr/test-framework/src/java/org/apache/solr/util/TestHarness.java SolrRequest sub-classes has a lot of duplicated code (grrrr). Cleaned up and put shared code in one place - SolrRequest solr/solrj/src/java/org/apache/solr/client/solrj/SolrRequest.java solr/solrj/src/java/org/apache/solr/client/solrj/request/AbstractUpdateRequest.java solr/solrj/src/java/org/apache/solr/client/solrj/request/FieldAnalysisRequest.java solr/solrj/src/java/org/apache/solr/client/solrj/request/DirectXmlRequest.java solr/solrj/src/java/org/apache/solr/client/solrj/request/LukeRequest.java solr/solrj/src/java/org/apache/solr/client/solrj/request/CoreAdminRequest.java solr/solrj/src/java/org/apache/solr/client/solrj/request/DocumentAnalysisRequest.java solr/solrj/src/java/org/apache/solr/client/solrj/request/SolrPing.java solr/solrj/src/java/org/apache/solr/client/solrj/request/QueryRequest.java Think we should work with a common (for both server and client side) " version " String constant, but VersionInfo is not available for client side, so I removed VERSION_FIELD = " version " from VersionInfo to SolrInputDocument in order to make it available for both server and client (SolrJ) side. solr/core/src/java/org/apache/solr/update/VersionInfo.java solr/solrj/src/java/org/apache/solr/common/SolrInputDocument.java solr/core/src/java/org/apache/solr/update/DirectUpdateHandler2.java solr/core/src/java/org/apache/solr/update/processor/DistributedUpdateProcessor.java solr/core/src/test/org/apache/solr/update/DirectUpdateHandlerTest.java solr/core/src/test/org/apache/solr/cloud/FullSolrCloudDistribCmdsTest.java Added some new tests with focus on testing the new version- and unique-key-based features - ability to fail instead of overwriting on unique-key violation and ability to report errors on version conflict solr/core/src/test/org/apache/solr/update/ClassicConsistencyHybridUpdateSemanticsTest.java solr/core/src/test/org/apache/solr/update/ClassicConsistencyHybridUpdateSemanticsPartialErrorsTest.java solr/core/src/test/org/apache/solr/update/ClassicUpdateSemanticsTest.java solr/core/src/test/org/apache/solr/cloud/ClassicConsistencyHybridUpdateSemanticsSolrCloudTest.java solr/solrj/src/test/org/apache/solr/client/update/ClassicConsistencyHybridUpdateSemanticsTest.java solr/solrj/src/test/org/apache/solr/client/update/ClassicConsistencyHybridUpdateSemanticsConcurrencyTest.java Renamed AddUpdateCommand.overwrite to AddUpdateCommand.classicOverwrite, because this is a field that is now only used when running with "classic" semantics solr/core/src/java/org/apache/solr/update/DirectUpdateHandler2.java solr/core/src/java/org/apache/solr/update/SolrCmdDistributor.java solr/core/src/java/org/apache/solr/handler/loader/JsonLoader.java solr/core/src/java/org/apache/solr/handler/loader/JavabinLoader.java solr/core/src/java/org/apache/solr/handler/loader/CSVLoaderBase.java solr/core/src/java/org/apache/solr/handler/loader/XMLLoader.java solr/contrib/extraction/src/java/org/apache/solr/handler/extraction/ExtractingDocumentLoader.java solr/solrj/src/java/org/apache/solr/client/solrj/request/UpdateRequestExt.java solr/core/src/java/org/apache/solr/update/AddUpdateCommand.java solr/core/src/test/org/apache/solr/update/TestIndexingPerformance.java solr/core/src/test/org/apache/solr/handler/XmlUpdateRequestHandlerTest.java solr/core/src/test/org/apache/solr/handler/BinaryUpdateRequestHandlerTest.java etc When putting config-files into ZK you can have them named something different in there than their file-name on disk. E.g. ./my/folder/my-solrconfig.xml can be named solrconfig.xml in ZK solr/core/src/test/org/apache/solr/cloud/AbstractZkTestCase.java solr/core/src/test/org/apache/solr/cloud/AbstractDistributedZkTestCase.java Misc unimportant cleanup - removed unneeded imports, etc. solr/core/src/test/org/apache/solr/update/DirectUpdateHandlerOptimizeTest.java solr/core/src/test/org/apache/solr/request/TestWriterPerf.java (just a nicer way to look and afterwards use rsp.getException()) solr/core/src/java/org/apache/solr/servlet/DirectSolrConnection.java (just a nicer way to look and afterwards use rsp.getException()) solr/core/src/java/org/apache/solr/client/solrj/embedded/EmbeddedSolrServer.java (just a nicer way to look and afterwards use rsp.getException()) solr/solrj/src/java/org/apache/solr/client/solrj/impl/LBHttpSolrServer.java
          Hide
          Robert Muir added a comment -

          rmuir20120906-bulk-40-change

          Show
          Robert Muir added a comment - rmuir20120906-bulk-40-change
          Hide
          Per Steffensen added a comment -

          Updated patch SOLR-3173_3178_3382_3428_plus.patch to fit on top of Solr 4.0.0. Actually updated to fit on top of revision 1394844 of branch lucene_solr_4_0, but I believe this is the same as Solr 4.0.0.

          Show
          Per Steffensen added a comment - Updated patch SOLR-3173 _3178_3382_3428_plus.patch to fit on top of Solr 4.0.0. Actually updated to fit on top of revision 1394844 of branch lucene_solr_4_0, but I believe this is the same as Solr 4.0.0.

            People

            • Assignee:
              Per Steffensen
              Reporter:
              Per Steffensen
            • Votes:
              3 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:

                Time Tracking

                Estimated:
                Original Estimate - 168h
                168h
                Remaining:
                Remaining Estimate - 168h
                168h
                Logged:
                Time Spent - Not Specified
                Not Specified

                  Development