Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
4.6, 7.5
-
None
Description
Replication API command=fetchindex do fetch the index. while occur the error, still give success response.
API should return the right status, especially WAIT parameter is true.(synchronous).
Attachments
Attachments
- SOLR-6117.patch
- 10 kB
- Jason Gerlowski
- SOLR-6117.patch
- 3 kB
- Shalin Shekhar Mangar
- SOLR-6117.patch
- 2 kB
- Shalin Shekhar Mangar
- SOLR-6117.txt
- 2 kB
- Raintung Li
- SOLR-6117-master.patch
- 22 kB
- Jason Gerlowski
Activity
Another iteration. I removed the exception from the Status class because the SnapPuller API is written to never throw exceptions. Instead, it just returns booleans for success/failure. Throwing exceptions is a very invasive change.
This patch just returns the right status. I've added a test as well. Is this sufficient for you Raintung?
It looks like all of the other commands also return OK_STATUS
else if (command.equals(CMD_GET_FILE)) { getFileStream(solrParams, rsp); } else if (command.equals(CMD_GET_FILE_LIST)) { getFileList(solrParams, rsp); } else if (command.equalsIgnoreCase(CMD_BACKUP)) { doSnapShoot(new ModifiableSolrParams(solrParams), rsp, req); rsp.add(STATUS, OK_STATUS); } else if (command.equalsIgnoreCase(CMD_DELETE_BACKUP)) { deleteSnapshot(new ModifiableSolrParams(solrParams), rsp, req); rsp.add(STATUS, OK_STATUS); }
The actual error stack trace gets printed in the logs. Should we change how the remaining also get handled?
The actual error stack trace gets printed in the logs. Should we change how the remaining also get handled?
I think an exception should be returned in the response if possible. The old days of just logging exceptions are gone. We can't expect users to sift through GBs of logs in SolrCloud to find the reason behind the failure. But that's a big change so I think we should do it in another issue.
Most recent attached patch is a slight update of Shalin's. I'd hoped to add a lot more tests with this that trigger the various failure conditions, but it's hard to reproduce many of them via JUnit. Also looked at adding unit tests for ReplicationHandler directly, but it relies heavily on SolrCore, which is final which makes mocking/stubbing difficult as well. If anyone sees a way to get more coverage on this without major surgery, I'd love to hear it.
The current patch makes sure that we never advertise a response as status=OK falsely, so it's just a bugfix and should be safe to include in branch_7x from a breaking-change perspective. There's a lot of other problems with the replication handler responses that would require breaking changes. Specifically:
- "status" is only present on some responses. Ideally it should be present on all /replication responses so that clients can rely on it being there.
- "status" is used inconsistently. Some uses give it an enum-like value that clients could key off of, others treat it like a "message" field and just give it random error messages
- when errors occur, the "message" and "exception" fields are used inconsistently. Ideally if an error occurs there would always be a message, and sometimes there would also be an exception.
- many of the error-cases involving argument-validation set the status field properly but return with the wrong HTTP status (200). (i.e. they should throw a SolrException).
I plan on working some of these out soon in a larger commit that can be put on master.
Attached an updated patch that's intended for the master branch, and thus has liberty to do more to make the various responses from the /replication API more uniform. This version of the patch addresses all of the bullet points in my previous comment. Haven't run tests more generally yet, but I hope to commit to master in the next week or so.
One thing I forgot to clarify in my previous comment: both of these patches address all subcommands in the /replication API (not just "fetchindex") That was a point of discussion in the original effort on this JIRA, so just thought I'd clarify.
Commit 1df106e0e726ca0fe83d5074973d91a0fb21fd0a in lucene-solr's branch refs/heads/branch_7x from gerlowskija
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1df106e ]
SOLR-6117: Return correct 'status' from ReplicationHandler
Commit c951578fcabfd5520f425156b0fa35f35811554d in lucene-solr's branch refs/heads/master from gerlowskija
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c951578 ]
SOLR-6117: Unify ReplicationHandler error handling
Prior to this commit, ReplicationHandler had a few inconsistencies
in how it reported errors:
- Sometimes the 'status' field was used as an enum (e.g. 'success'
v. 'failure'. Elsewhere it is used to hold error messages. - Sometimes an explicit 'message' field was used, but often not.
- Sometimes a stack trace was provided in place of an error message.
This commit tweaks the various error cases in ReplicationHandler to
report errors consistently. 'status' is always an enum-type value. A
'message' field is provided for all errors, with an optional 'exception'
field.
Marking this as 'Fixed' for 8.0 and 7.7. To summarize/clarify, the fixes on master and branch_7x are a little different based on the need to avoid potentially breaking changes on 7x. The 7x changes only go far enough to fix the bug where we return a
"success" status even when the request fails. The master changes do this, as well as correcting a few inconsistencies between the different error cases.
Thanks Raintung.
In this patch, I've made changes to store the actual exception and to print it in the response.