Solr
  1. Solr
  2. SOLR-445

Update Handlers abort with bad documents

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: master, 6.1
    • Component/s: update
    • Labels:
      None

      Description

      This issue adds a new TolerantUpdateProcessorFactory making it possible to configure solr updates so that they are "tolerant" of individual errors in an update request...

        <processor class="solr.TolerantUpdateProcessorFactory">
          <int name="maxErrors">10</int>
        </processor>
      

      When a chain with this processor is used, but maxErrors isn't exceeded, here's what the response looks like...

      $ curl 'http://localhost:8983/solr/techproducts/update?update.chain=tolerant-chain&wt=json&indent=true&maxErrors=-1' -H "Content-Type: application/json" --data-binary '{"add" : { "doc":{"id":"1","foo_i":"bogus"}}, "delete": {"query":"malformed:["}}'
      {
        "responseHeader":{
          "errors":[{
              "type":"ADD",
              "id":"1",
              "message":"ERROR: [doc=1] Error adding field 'foo_i'='bogus' msg=For input string: \"bogus\""},
            {
              "type":"DELQ",
              "id":"malformed:[",
              "message":"org.apache.solr.search.SyntaxError: Cannot parse 'malformed:[': Encountered \"<EOF>\" at line 1, column 11.\nWas expecting one of:\n    <RANGE_QUOTED> ...\n    <RANGE_GOOP> ...\n    "}],
          "maxErrors":-1,
          "status":0,
          "QTime":1}}
      

      Note in the above example that:

      • maxErrors can be overridden on a per-request basis
      • an effective maxErrors==-1 (either from config, or request param) means "unlimited" (under the covers it's using Integer.MAX_VALUE)

      If/When maxErrors is reached for a request, then the first exception that the processor caught is propagated back to the user, and metadata is set on that exception with all of the same details about all the tolerated errors.

      This next example is the same as the previous except that instead of maxErrors=-1 the request param is now maxErrors=1...

      $ curl 'http://localhost:8983/solr/techproducts/update?update.chain=tolerant-chain&wt=json&indent=true&maxErrors=1' -H "Content-Type: application/json" --data-binary '{"add" : { "doc":{"id":"1","foo_i":"bogus"}}, "delete": {"query":"malformed:["}}'
      {
        "responseHeader":{
          "errors":[{
              "type":"ADD",
              "id":"1",
              "message":"ERROR: [doc=1] Error adding field 'foo_i'='bogus' msg=For input string: \"bogus\""},
            {
              "type":"DELQ",
              "id":"malformed:[",
              "message":"org.apache.solr.search.SyntaxError: Cannot parse 'malformed:[': Encountered \"<EOF>\" at line 1, column 11.\nWas expecting one of:\n    <RANGE_QUOTED> ...\n    <RANGE_GOOP> ...\n    "}],
          "maxErrors":1,
          "status":400,
          "QTime":1},
        "error":{
          "metadata":[
            "org.apache.solr.common.ToleratedUpdateError--ADD:1","ERROR: [doc=1] Error adding field 'foo_i'='bogus' msg=For input string: \"bogus\"",
            "org.apache.solr.common.ToleratedUpdateError--DELQ:malformed:[","org.apache.solr.search.SyntaxError: Cannot parse 'malformed:[': Encountered \"<EOF>\" at line 1, column 11.\nWas expecting one of:\n    <RANGE_QUOTED> ...\n    <RANGE_GOOP> ...\n    ",
            "error-class","org.apache.solr.common.SolrException",
            "root-error-class","java.lang.NumberFormatException"],
          "msg":"ERROR: [doc=1] Error adding field 'foo_i'='bogus' msg=For input string: \"bogus\"",
          "code":400}}
      

      ...the added exception metadata ensures that even in client code like the various SolrJ SolrClient implementations, which throw a (client side) exception on non-200 responses, the end user can access info on all the tolerated errors that were ignored before the maxErrors threshold was reached.


      Original Jira Request

      Has anyone run into the problem of handling bad documents / failures mid batch. Ie:

      <add>
      <doc>
      <field name="id">1</field>
      </doc>
      <doc>
      <field name="id">2</field>
      <field name="myDateField">I_AM_A_BAD_DATE</field>
      </doc>
      <doc>
      <field name="id">3</field>
      </doc>
      </add>

      Right now solr adds the first doc and then aborts. It would seem like it should either fail the entire batch or log a message/return a code and then continue on to add doc 3. Option 1 would seem to be much harder to accomplish and possibly require more memory while Option 2 would require more information to come back from the API. I'm about to dig into this but I thought I'd ask to see if anyone had any suggestions, thoughts or comments.

      1. SOLR-445.patch
        42 kB
        Erick Erickson
      2. solr-445.xml
        0.7 kB
        Erick Erickson
      3. SOLR-445.patch
        18 kB
        Erick Erickson
      4. SOLR-445-3_x.patch
        20 kB
        Erick Erickson
      5. SOLR-445.patch
        45 kB
        Erick Erickson
      6. SOLR-445_3x.patch
        46 kB
        Erick Erickson
      7. SOLR-445.patch
        44 kB
        Erick Erickson
      8. SOLR-445-alternative.patch
        23 kB
        Tomás Fernández Löbbe
      9. SOLR-445-alternative.patch
        23 kB
        Tomás Fernández Löbbe
      10. SOLR-445-alternative.patch
        26 kB
        Tomás Fernández Löbbe
      11. SOLR-445-alternative.patch
        36 kB
        Tomás Fernández Löbbe
      12. SOLR-445.patch
        40 kB
        Anshum Gupta
      13. SOLR-445.patch
        41 kB
        Anshum Gupta
      14. SOLR-445.patch
        40 kB
        Anshum Gupta
      15. SOLR-445.patch
        83 kB
        Hoss Man
      16. SOLR-445.patch
        88 kB
        Hoss Man

        Issue Links

        There are no Sub-Tasks for this issue.

          Activity

          Will Johnson created issue -
          Grant Ingersoll made changes -
          Field Original Value New Value
          Assignee Grant Ingersoll [ gsingers ]
          Shalin Shekhar Mangar made changes -
          Fix Version/s 1.4 [ 12313351 ]
          Shalin Shekhar Mangar made changes -
          Fix Version/s 1.4 [ 12313351 ]
          Fix Version/s 1.5 [ 12313566 ]
          Grant Ingersoll made changes -
          Assignee Grant Ingersoll [ gsingers ]
          Hoss Man made changes -
          Fix Version/s Next [ 12315093 ]
          Fix Version/s 1.5 [ 12313566 ]
          Erick Erickson made changes -
          Assignee Erick Erickson [ erickerickson ]
          Erick Erickson made changes -
          Attachment SOLR-445.patch [ 12468218 ]
          Attachment solr-445.xml [ 12468219 ]
          Erick Erickson made changes -
          Attachment SOLR-445.patch [ 12468643 ]
          Attachment SOLR-445-3_x.patch [ 12468644 ]
          Erick Erickson made changes -
          Attachment SOLR-445.patch [ 12469113 ]
          Attachment SOLR-445_3x.patch [ 12469114 ]
          Grant Ingersoll made changes -
          Assignee Erick Erickson [ erickerickson ] Grant Ingersoll [ gsingers ]
          Erick Erickson made changes -
          Attachment SOLR-445.patch [ 12476579 ]
          Grant Ingersoll made changes -
          Summary XmlUpdateRequestHandler bad documents mid batch aborts rest of batch Update Handlers abort with bad documents
          Hoss Man made changes -
          Fix Version/s 3.2 [ 12316172 ]
          Fix Version/s Next [ 12315093 ]
          Grant Ingersoll made changes -
          Assignee Grant Ingersoll [ gsingers ]
          Erick Erickson made changes -
          Assignee Erick Erickson [ erickerickson ]
          Robert Muir made changes -
          Fix Version/s 3.3 [ 12316471 ]
          Fix Version/s 3.2 [ 12316172 ]
          Robert Muir made changes -
          Fix Version/s 3.4 [ 12316683 ]
          Fix Version/s 4.0 [ 12314992 ]
          Fix Version/s 3.3 [ 12316471 ]
          Robert Muir made changes -
          Fix Version/s 3.5 [ 12317876 ]
          Fix Version/s 3.4 [ 12316683 ]
          Simon Willnauer made changes -
          Fix Version/s 3.6 [ 12319065 ]
          Fix Version/s 3.5 [ 12317876 ]
          Erick Erickson made changes -
          Fix Version/s 3.6 [ 12319065 ]
          Erick Erickson made changes -
          Issue Type Bug [ 1 ] Improvement [ 4 ]
          Assignee Erick Erickson [ erickerickson ]
          Robert Muir made changes -
          Fix Version/s 4.1 [ 12321141 ]
          Fix Version/s 4.0 [ 12314992 ]
          Per Steffensen made changes -
          Link This issue is part of SOLR-3382 [ SOLR-3382 ]
          Per Steffensen made changes -
          Link This issue depends on SOLR-3382 [ SOLR-3382 ]
          Per Steffensen made changes -
          Link This issue depends on SOLR-3382 [ SOLR-3382 ]
          Erick Erickson made changes -
          Link This issue is related to SOLR-1113 [ SOLR-1113 ]
          Erick Erickson made changes -
          Link This issue is related to SOLR-3178 [ SOLR-3178 ]
          Mark Miller made changes -
          Fix Version/s 4.2 [ 12323893 ]
          Fix Version/s 5.0 [ 12321664 ]
          Fix Version/s 4.1 [ 12321141 ]
          Robert Muir made changes -
          Fix Version/s 4.3 [ 12324128 ]
          Fix Version/s 5.0 [ 12321664 ]
          Fix Version/s 4.2 [ 12323893 ]
          Uwe Schindler made changes -
          Fix Version/s 4.4 [ 12324324 ]
          Fix Version/s 4.3 [ 12324128 ]
          Steve Rowe made changes -
          Fix Version/s 5.0 [ 12321664 ]
          Fix Version/s 4.5 [ 12324743 ]
          Fix Version/s 4.4 [ 12324324 ]
          Adrien Grand made changes -
          Fix Version/s 4.6 [ 12325000 ]
          Fix Version/s 5.0 [ 12321664 ]
          Fix Version/s 4.5 [ 12324743 ]
          Uwe Schindler made changes -
          Fix Version/s 4.7 [ 12325573 ]
          Fix Version/s 4.6 [ 12325000 ]
          David Smiley made changes -
          Fix Version/s 4.8 [ 12326254 ]
          Fix Version/s 4.7 [ 12325573 ]
          Tomás Fernández Löbbe made changes -
          Attachment SOLR-445-alternative.patch [ 12637960 ]
          Tomás Fernández Löbbe made changes -
          Attachment SOLR-445-alternative.patch [ 12638311 ]
          Uwe Schindler made changes -
          Fix Version/s 4.9 [ 12326731 ]
          Fix Version/s 5.0 [ 12321664 ]
          Fix Version/s 4.8 [ 12326254 ]
          Tomás Fernández Löbbe made changes -
          Attachment SOLR-445-alternative.patch [ 12640892 ]
          Tomás Fernández Löbbe made changes -
          Attachment SOLR-445-alternative.patch [ 12642044 ]
          Anshum Gupta made changes -
          Assignee Anshum Gupta [ anshumg ]
          Anshum Gupta made changes -
          Fix Version/s Trunk [ 12321664 ]
          Fix Version/s 4.9 [ 12326731 ]
          Anshum Gupta made changes -
          Attachment SOLR-445.patch [ 12746033 ]
          Anshum Gupta made changes -
          Attachment SOLR-445.patch [ 12746507 ]
          Anshum Gupta made changes -
          Attachment SOLR-445.patch [ 12747297 ]
          Mikhail Khludnev made changes -
          Link This issue is duplicated by SOLR-7914 [ SOLR-7914 ]
          Hoss Man made changes -
          Attachment SOLR-445.patch [ 12778147 ]
          Hoss Man made changes -
          Attachment SOLR-445.patch [ 12786596 ]
          Hoss Man made changes -
          Assignee Anshum Gupta [ anshumg ] Hoss Man [ hossman ]
          Hoss Man made changes -
          Link This issue depends upon SOLR-8633 [ SOLR-8633 ]
          Hoss Man made changes -
          Link This issue is blocked by SOLR-8738 [ SOLR-8738 ]
          Hoss Man made changes -
          Link This issue is related to SOLR-8872 [ SOLR-8872 ]
          Hoss Man made changes -
          Link This issue is related to SOLR-8881 [ SOLR-8881 ]
          Hoss Man made changes -
          Description Has anyone run into the problem of handling bad documents / failures mid batch. Ie:

          <add>
            <doc>
              <field name="id">1</field>
            </doc>
            <doc>
              <field name="id">2</field>
              <field name="myDateField">I_AM_A_BAD_DATE</field>
            </doc>
            <doc>
              <field name="id">3</field>
            </doc>
          </add>

          Right now solr adds the first doc and then aborts. It would seem like it should either fail the entire batch or log a message/return a code and then continue on to add doc 3. Option 1 would seem to be much harder to accomplish and possibly require more memory while Option 2 would require more information to come back from the API. I'm about to dig into this but I thought I'd ask to see if anyone had any suggestions, thoughts or comments.

          This issue adds a new {{TolerantUpdateProcessorFactory}} making it possible to configure solr updates so that they are "tolerant" of individual errors in an update request...

          {code}
            <processor class="solr.TolerantUpdateProcessorFactory">
              <int name="maxErrors">10</int>
            </processor>
          {code}

          When a chain with this processor is used, but maxErrors isn't exceeded, here's what the response looks like...

          {code}
          $ curl 'http://localhost:8983/solr/techproducts/update?update.chain=tolerant-chain&wt=json&indent=true&maxErrors=-1&#39; -H "Content-Type: application/json" --data-binary '{"add" : { "doc":{"id":"1","foo_i":"bogus"}}, "delete": {"query":"malformed:["}}'
          {
            "responseHeader":{
              "errors":[{
                  "type":"ADD",
                  "id":"1",
                  "message":"ERROR: [doc=1] Error adding field 'foo_i'='bogus' msg=For input string: \"bogus\""},
                {
                  "type":"DELQ",
                  "id":"malformed:[",
                  "message":"org.apache.solr.search.SyntaxError: Cannot parse 'malformed:[': Encountered \"<EOF>\" at line 1, column 11.\nWas expecting one of:\n <RANGE_QUOTED> ...\n <RANGE_GOOP> ...\n "}],
              "maxErrors":-1,
              "status":0,
              "QTime":1}}
          {code}

          Note in the above example that:

          * maxErrors can be overridden on a per-request basis
          * an effective {{maxErrors==-1}} (either from config, or request param) means "unlimited" (under the covers it's using {{Integer.MAX_VALUE}})

          If/When maxErrors is reached for a request, then the _first_ exception that the processor caught is propagated back to the user, and metadata is set on that exception with all of the same details about all the tolerated errors.

          This next example is the same as the previous except that instead of {{maxErrors=-1}} the request param is now {{maxErrors=1}}...

          {code}
          $ curl 'http://localhost:8983/solr/techproducts/update?update.chain=tolerant-chain&wt=json&indent=true&maxErrors=1&#39; -H "Content-Type: application/json" --data-binary '{"add" : { "doc":{"id":"1","foo_i":"bogus"}}, "delete": {"query":"malformed:["}}'
          {
            "responseHeader":{
              "errors":[{
                  "type":"ADD",
                  "id":"1",
                  "message":"ERROR: [doc=1] Error adding field 'foo_i'='bogus' msg=For input string: \"bogus\""},
                {
                  "type":"DELQ",
                  "id":"malformed:[",
                  "message":"org.apache.solr.search.SyntaxError: Cannot parse 'malformed:[': Encountered \"<EOF>\" at line 1, column 11.\nWas expecting one of:\n <RANGE_QUOTED> ...\n <RANGE_GOOP> ...\n "}],
              "maxErrors":1,
              "status":400,
              "QTime":1},
            "error":{
              "metadata":[
                "org.apache.solr.common.ToleratedUpdateError--ADD:1","ERROR: [doc=1] Error adding field 'foo_i'='bogus' msg=For input string: \"bogus\"",
                "org.apache.solr.common.ToleratedUpdateError--DELQ:malformed:[","org.apache.solr.search.SyntaxError: Cannot parse 'malformed:[': Encountered \"<EOF>\" at line 1, column 11.\nWas expecting one of:\n <RANGE_QUOTED> ...\n <RANGE_GOOP> ...\n ",
                "error-class","org.apache.solr.common.SolrException",
                "root-error-class","java.lang.NumberFormatException"],
              "msg":"ERROR: [doc=1] Error adding field 'foo_i'='bogus' msg=For input string: \"bogus\"",
              "code":400}}
          {code}

          ...the added exception metadata ensures that even in client code like the various SolrJ SolrClient implementations, which throw a (client side) exception on non-200 responses, the end user can access info on all the tolerated errors that were ignored before the maxErrors threshold was reached.


          ----

          {panel:title=Original Jira Request}
          Has anyone run into the problem of handling bad documents / failures mid batch. Ie:

          <add>
            <doc>
              <field name="id">1</field>
            </doc>
            <doc>
              <field name="id">2</field>
              <field name="myDateField">I_AM_A_BAD_DATE</field>
            </doc>
            <doc>
              <field name="id">3</field>
            </doc>
          </add>

          Right now solr adds the first doc and then aborts. It would seem like it should either fail the entire batch or log a message/return a code and then continue on to add doc 3. Option 1 would seem to be much harder to accomplish and possibly require more memory while Option 2 would require more information to come back from the API. I'm about to dig into this but I thought I'd ask to see if anyone had any suggestions, thoughts or comments.
          {panel}
          Fix Version/s master [ 12321664 ]
          Fix Version/s 6.1 [ 12334969 ]
          Affects Version/s 1.3 [ 12312486 ]
          Hoss Man made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]

            People

            • Assignee:
              Hoss Man
              Reporter:
              Will Johnson
            • Votes:
              6 Vote for this issue
              Watchers:
              21 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development