Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-445

Update Handlers abort with bad documents

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.1, 7.0
    • Component/s: update
    • Labels:
      None

      Description

      This issue adds a new TolerantUpdateProcessorFactory making it possible to configure solr updates so that they are "tolerant" of individual errors in an update request...

        <processor class="solr.TolerantUpdateProcessorFactory">
          <int name="maxErrors">10</int>
        </processor>
      

      When a chain with this processor is used, but maxErrors isn't exceeded, here's what the response looks like...

      $ curl 'http://localhost:8983/solr/techproducts/update?update.chain=tolerant-chain&wt=json&indent=true&maxErrors=-1' -H "Content-Type: application/json" --data-binary '{"add" : { "doc":{"id":"1","foo_i":"bogus"}}, "delete": {"query":"malformed:["}}'
      {
        "responseHeader":{
          "errors":[{
              "type":"ADD",
              "id":"1",
              "message":"ERROR: [doc=1] Error adding field 'foo_i'='bogus' msg=For input string: \"bogus\""},
            {
              "type":"DELQ",
              "id":"malformed:[",
              "message":"org.apache.solr.search.SyntaxError: Cannot parse 'malformed:[': Encountered \"<EOF>\" at line 1, column 11.\nWas expecting one of:\n    <RANGE_QUOTED> ...\n    <RANGE_GOOP> ...\n    "}],
          "maxErrors":-1,
          "status":0,
          "QTime":1}}
      

      Note in the above example that:

      • maxErrors can be overridden on a per-request basis
      • an effective maxErrors==-1 (either from config, or request param) means "unlimited" (under the covers it's using Integer.MAX_VALUE)

      If/When maxErrors is reached for a request, then the first exception that the processor caught is propagated back to the user, and metadata is set on that exception with all of the same details about all the tolerated errors.

      This next example is the same as the previous except that instead of maxErrors=-1 the request param is now maxErrors=1...

      $ curl 'http://localhost:8983/solr/techproducts/update?update.chain=tolerant-chain&wt=json&indent=true&maxErrors=1' -H "Content-Type: application/json" --data-binary '{"add" : { "doc":{"id":"1","foo_i":"bogus"}}, "delete": {"query":"malformed:["}}'
      {
        "responseHeader":{
          "errors":[{
              "type":"ADD",
              "id":"1",
              "message":"ERROR: [doc=1] Error adding field 'foo_i'='bogus' msg=For input string: \"bogus\""},
            {
              "type":"DELQ",
              "id":"malformed:[",
              "message":"org.apache.solr.search.SyntaxError: Cannot parse 'malformed:[': Encountered \"<EOF>\" at line 1, column 11.\nWas expecting one of:\n    <RANGE_QUOTED> ...\n    <RANGE_GOOP> ...\n    "}],
          "maxErrors":1,
          "status":400,
          "QTime":1},
        "error":{
          "metadata":[
            "org.apache.solr.common.ToleratedUpdateError--ADD:1","ERROR: [doc=1] Error adding field 'foo_i'='bogus' msg=For input string: \"bogus\"",
            "org.apache.solr.common.ToleratedUpdateError--DELQ:malformed:[","org.apache.solr.search.SyntaxError: Cannot parse 'malformed:[': Encountered \"<EOF>\" at line 1, column 11.\nWas expecting one of:\n    <RANGE_QUOTED> ...\n    <RANGE_GOOP> ...\n    ",
            "error-class","org.apache.solr.common.SolrException",
            "root-error-class","java.lang.NumberFormatException"],
          "msg":"ERROR: [doc=1] Error adding field 'foo_i'='bogus' msg=For input string: \"bogus\"",
          "code":400}}
      

      ...the added exception metadata ensures that even in client code like the various SolrJ SolrClient implementations, which throw a (client side) exception on non-200 responses, the end user can access info on all the tolerated errors that were ignored before the maxErrors threshold was reached.


      Original Jira Request

      Has anyone run into the problem of handling bad documents / failures mid batch. Ie:

      <add>
      <doc>
      <field name="id">1</field>
      </doc>
      <doc>
      <field name="id">2</field>
      <field name="myDateField">I_AM_A_BAD_DATE</field>
      </doc>
      <doc>
      <field name="id">3</field>
      </doc>
      </add>

      Right now solr adds the first doc and then aborts. It would seem like it should either fail the entire batch or log a message/return a code and then continue on to add doc 3. Option 1 would seem to be much harder to accomplish and possibly require more memory while Option 2 would require more information to come back from the API. I'm about to dig into this but I thought I'd ask to see if anyone had any suggestions, thoughts or comments.

        Attachments

        1. SOLR-445.patch
          88 kB
          Hoss Man
        2. SOLR-445.patch
          83 kB
          Hoss Man
        3. SOLR-445.patch
          40 kB
          Anshum Gupta
        4. SOLR-445.patch
          41 kB
          Anshum Gupta
        5. SOLR-445.patch
          40 kB
          Anshum Gupta
        6. SOLR-445-alternative.patch
          36 kB
          Tomás Fernández Löbbe
        7. SOLR-445-alternative.patch
          26 kB
          Tomás Fernández Löbbe
        8. SOLR-445-alternative.patch
          23 kB
          Tomás Fernández Löbbe
        9. SOLR-445-alternative.patch
          23 kB
          Tomás Fernández Löbbe
        10. SOLR-445.patch
          44 kB
          Erick Erickson
        11. SOLR-445_3x.patch
          46 kB
          Erick Erickson
        12. SOLR-445.patch
          45 kB
          Erick Erickson
        13. SOLR-445-3_x.patch
          20 kB
          Erick Erickson
        14. SOLR-445.patch
          18 kB
          Erick Erickson
        15. solr-445.xml
          0.7 kB
          Erick Erickson
        16. SOLR-445.patch
          42 kB
          Erick Erickson

          Issue Links

            Activity

              People

              • Assignee:
                hossman Hoss Man
                Reporter:
                willjohnson3 Will Johnson
              • Votes:
                6 Vote for this issue
                Watchers:
                24 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: