Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-12842

CDCR stops replication and goes into infinite loop of retrying, if one of the updates are corrupted.

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 8.0
    • Fix Version/s: None
    • Component/s: CDCR
    • Labels:
      None

      Description

      Currently, CDCR reads updates from the transaction logs, create UpdateRequest and forwards it to the target. If the UpdateRequest sent fails, the same request is retried indefinitely until a successful acknowledgment is received.

      2018-10-03 00:05:38.878 WARN (cdcr-replicator-35-thread-4) [ ] o.a.s.h.CdcrReplicator Failed to forward update request to target: DEMO_DR
      org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://10.202.153.32:3987/solr/DEMO_DR: version conflict for ec53e70d-72fb-42bd-827e-a3fc54e33bad expected=1613259487692455936 actual=-1
      at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:643) ~[solr-solrj-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:14]
      at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255) ~[solr-solrj-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:14]
      at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244) ~[solr-solrj-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:14]
      at org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:483) ~[solr-solrj-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:14]
      at org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:413) ~[solr-solrj-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:14]
      at org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1106) ~[solr-solrj-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:14]
      at org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:886) ~[solr-solrj-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:14]
      at org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:819) ~[solr-solrj-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:14]
      at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194) ~[solr-solrj-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:14]
      at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:211) ~[solr-solrj-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:14]
      at org.apache.solr.handler.CdcrReplicator.sendRequest(CdcrReplicator.java:140) ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:13]
      at org.apache.solr.handler.CdcrReplicator.run(CdcrReplicator.java:120) ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:13]
      at org.apache.solr.handler.CdcrReplicatorScheduler.lambda$null$0(CdcrReplicatorScheduler.java:81) ~[solr-core-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:13]
      at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209) ~[solr-solrj-7.4.0.jar:7.4.0 9060ac689c270b02143f375de0348b7f626adebc - jpountz - 2018-06-18 16:55:14]
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0-zing_18.07.1.0]
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0-zing_18.07.1.0]
      at java.lang.Thread.run(Thread.java:748) [?:1.8.0-zing_18.07.1.0]
      

      This design is in place when connected to target is broken and you need to halt the forwarding, but better error handling should be in place if a bad payload or other external factors are causing the failures. Suggestions and feedbacks are welcome.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              sarkaramrit2@gmail.com Amrit Sarkar
            • Votes:
              3 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: