[SOLR-445] Update Handlers abort with bad documents - ASF JIRA

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 6.1, 7.0
Component/s: update
Labels:
None

Description

This issue adds a new TolerantUpdateProcessorFactory making it possible to configure solr updates so that they are "tolerant" of individual errors in an update request...

  <processor class="solr.TolerantUpdateProcessorFactory">
    <int name="maxErrors">10</int>
  </processor>

When a chain with this processor is used, but maxErrors isn't exceeded, here's what the response looks like...

$ curl 'http://localhost:8983/solr/techproducts/update?update.chain=tolerant-chain&wt=json&indent=true&maxErrors=-1' -H "Content-Type: application/json" --data-binary '{"add" : { "doc":{"id":"1","foo_i":"bogus"}}, "delete": {"query":"malformed:["}}'
{
  "responseHeader":{
    "errors":[{
        "type":"ADD",
        "id":"1",
        "message":"ERROR: [doc=1] Error adding field 'foo_i'='bogus' msg=For input string: \"bogus\""},
      {
        "type":"DELQ",
        "id":"malformed:[",
        "message":"org.apache.solr.search.SyntaxError: Cannot parse 'malformed:[': Encountered \"<EOF>\" at line 1, column 11.\nWas expecting one of:\n    <RANGE_QUOTED> ...\n    <RANGE_GOOP> ...\n    "}],
    "maxErrors":-1,
    "status":0,
    "QTime":1}}

Note in the above example that:

maxErrors can be overridden on a per-request basis
an effective maxErrors==-1 (either from config, or request param) means "unlimited" (under the covers it's using Integer.MAX_VALUE)

If/When maxErrors is reached for a request, then the first exception that the processor caught is propagated back to the user, and metadata is set on that exception with all of the same details about all the tolerated errors.

This next example is the same as the previous except that instead of maxErrors=-1 the request param is now maxErrors=1...

$ curl 'http://localhost:8983/solr/techproducts/update?update.chain=tolerant-chain&wt=json&indent=true&maxErrors=1' -H "Content-Type: application/json" --data-binary '{"add" : { "doc":{"id":"1","foo_i":"bogus"}}, "delete": {"query":"malformed:["}}'
{
  "responseHeader":{
    "errors":[{
        "type":"ADD",
        "id":"1",
        "message":"ERROR: [doc=1] Error adding field 'foo_i'='bogus' msg=For input string: \"bogus\""},
      {
        "type":"DELQ",
        "id":"malformed:[",
        "message":"org.apache.solr.search.SyntaxError: Cannot parse 'malformed:[': Encountered \"<EOF>\" at line 1, column 11.\nWas expecting one of:\n    <RANGE_QUOTED> ...\n    <RANGE_GOOP> ...\n    "}],
    "maxErrors":1,
    "status":400,
    "QTime":1},
  "error":{
    "metadata":[
      "org.apache.solr.common.ToleratedUpdateError--ADD:1","ERROR: [doc=1] Error adding field 'foo_i'='bogus' msg=For input string: \"bogus\"",
      "org.apache.solr.common.ToleratedUpdateError--DELQ:malformed:[","org.apache.solr.search.SyntaxError: Cannot parse 'malformed:[': Encountered \"<EOF>\" at line 1, column 11.\nWas expecting one of:\n    <RANGE_QUOTED> ...\n    <RANGE_GOOP> ...\n    ",
      "error-class","org.apache.solr.common.SolrException",
      "root-error-class","java.lang.NumberFormatException"],
    "msg":"ERROR: [doc=1] Error adding field 'foo_i'='bogus' msg=For input string: \"bogus\"",
    "code":400}}

...the added exception metadata ensures that even in client code like the various SolrJ SolrClient implementations, which throw a (client side) exception on non-200 responses, the end user can access info on all the tolerated errors that were ignored before the maxErrors threshold was reached.

Original Jira Request

Has anyone run into the problem of handling bad documents / failures mid batch. Ie:

Right now solr adds the first doc and then aborts. It would seem like it should either fail the entire batch or log a message/return a code and then continue on to add doc 3. Option 1 would seem to be much harder to accomplish and possibly require more memory while Option 2 would require more information to come back from the API. I'm about to dig into this but I thought I'd ask to see if anyone had any suggestions, thoughts or comments.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

SOLR-445.patch
05/Feb/16 23:41
88 kB
Chris M. Hostetter
SOLR-445.patch
17/Dec/15 00:31
83 kB
Chris M. Hostetter
SOLR-445.patch
27/Jul/15 07:47
40 kB
Anshum Gupta
SOLR-445.patch
22/Jul/15 08:32
41 kB
Anshum Gupta
SOLR-445.patch
19/Jul/15 22:30
40 kB
Anshum Gupta
SOLR-445-alternative.patch
25/Apr/14 23:54
36 kB
Tomas Eduardo Fernandez Lobbe
SOLR-445-alternative.patch
18/Apr/14 21:52
26 kB
Tomas Eduardo Fernandez Lobbe
SOLR-445-alternative.patch
02/Apr/14 19:13
23 kB
Tomas Eduardo Fernandez Lobbe
SOLR-445-alternative.patch
01/Apr/14 00:26
23 kB
Tomas Eduardo Fernandez Lobbe
SOLR-445.patch
18/Apr/11 00:30
44 kB
Erick Erickson
SOLR-445_3x.patch
24/Jan/11 03:35
46 kB
Erick Erickson
SOLR-445.patch
24/Jan/11 03:35
45 kB
Erick Erickson
SOLR-445-3_x.patch
18/Jan/11 12:58
20 kB
Erick Erickson
SOLR-445.patch
18/Jan/11 12:58
18 kB
Erick Erickson
solr-445.xml
13/Jan/11 04:22
0.7 kB
Erick Erickson
SOLR-445.patch
13/Jan/11 04:22
42 kB
Erick Erickson

Issue Links

depends upon

SOLR-8633 DistributedUpdateProcess processCommit/deleteByQuery call finish on DUP and SolrCmdDistributor, which violates the lifecycle and can cause bugs.

Resolved

is blocked by

SOLR-8738 invalid DBQ initially sent to a non-leader node will report success

Resolved

is duplicated by

SOLR-7914 Improve bulk doc update

Closed

is part of

SOLR-3382 Finegrained error propagation (focus on multi-document updates)

Open

is related to

SOLR-3178 Versioning - optimistic locking

Open

SOLR-8881 test & document (and improve as possible) behavior of TolerantUpdateProcessor while shard splitting is in progress

Open

SOLR-1113 Error reports from ExtractingRequestHandler and Co do not indicate name of rejected documents

Closed

SOLR-8872 ChaosMonkey depends on AbstractFullDistribZkTestBase, can't be used with MiniSolrCloudCluster

Open

links to

GitHub Pull Request #43

(3 is related to, 1 links to)

Sub-Tasks

1.

make whitelist of params that DUP will forward (in filterParams(...)) amendable by other processor factories

Closed

Chris M. Hostetter

Update Handlers abort with bad documents

Details

Description

Attachments

Attachments

Issue Links

Sub-Tasks

Activity

People

Dates