Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-12470

Search Rate Trigger created more than 3 replicas

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 7.5
    • Component/s: AutoScaling
    • Labels:
      None

      Description

      Here's the trigger that I created . At this point the collection was one shard and one replica ( on node3 )

      curl -X POST -H 'Content-type:application/json' --data-binary '{
      "set-trigger": {
      "name" : "search_rate_trigger",
      "event" : "searchRate",
      "collection" : "test_rate_trigger",
      "rate" : 1.0,
      "waitFor" : "1m",
      "enabled" : true,
      "actions" : [
      {
      "name" : "compute_plan",
      "class": "solr.ComputePlanAction"
      },
      {
      "name" : "execute_plan",
      "class": "solr.ExecutePlanAction"
      }
      ]
      }
      }' http://localhost:8983/solr/admin/autoscaling

       

      I also had a trigger listener setup as I was testing the listener feature

      curl -X POST -H 'Content-type:application/json' --data-binary '{
      "set-listener": {
      "name": "search_rate_listener",
      "trigger": "search_rate_trigger",
      "stage": ["STARTED", "ABORTED", "SUCCEEDED", "FAILED"],
      "class": "solr.SystemLogListener"
      }
      }' http://localhost:8983/solr/admin/autoscaling

       

      I ran a script to fire queries at every 100ms . The index didn't have any docs so it's a simple match all query

      while [ 1 ]
      do
      curl -s "http://localhost:8984/solr/test_rate_trigger/select/?q=*:*" > /dev/null
      sleep .1
      done

      After a few minutes I see 4 replicas being created.

      Attaching logs from all 4 nodes. It should be fairly easy to reproduce with the above mentioned steps

      Also attaching all the docs from the .system collection for reference

      Here's another interesting this I noticed. I re-created the setup but this time removed the execute_plan part

      Now every 1 min the compute plan action tries to create 3 replicas . Why I found this interesting is that it was trying to create two replicas on the same node
      Does this look like a separate bug?

      INFO - 2018-06-08 03:41:32.586; [ ] org.apache.solr.servlet.HttpSolrCall; [admin] webapp=null path=/admin/metrics params={wt=javabin&version=2&key=solr.jvm:os.processCpuLoad&key=solr.node:CONTAINER.fs.coreRoot.usableSpace&key=solr.jvm:os.systemLoadAverage&key=solr.jvm:memory.heap.used} status=0 QTime=0
      INFO - 2018-06-08 03:41:40.909; [ ] org.apache.solr.servlet.HttpSolrCall; [admin] webapp=null path=/admin/metrics params={prefix=CONTAINER.fs.usableSpace,CORE.coreName&wt=javabin&version=2&group=solr.node,solr.core} status=0 QTime=1
      INFO - 2018-06-08 03:41:40.932; [ ] org.apache.solr.cloud.autoscaling.ComputePlanAction; Computed Plan: action=ADDREPLICA&collection=test_rate_trigger&shard=shard1&node=127.94.0.1:8984_solr&type=NRT
      INFO - 2018-06-08 03:41:40.933; [ ] org.apache.solr.cloud.autoscaling.ComputePlanAction; Computed Plan: action=ADDREPLICA&collection=test_rate_trigger&shard=shard1&node=127.94.0.1:8983_solr&type=NRT
      INFO - 2018-06-08 03:41:40.934; [ ] org.apache.solr.cloud.autoscaling.ComputePlanAction; Computed Plan: action=ADDREPLICA&collection=test_rate_trigger&shard=shard1&node=127.94.0.1:8983_solr&type=NRT
      INFO - 2018-06-08 03:41:40.934; [ ] org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper; returnSession, curr-time 9184331 sessionWrapper.createTime 9184324085271, this.sessionWrapper.createTime 9184324085271
      INFO - 2018-06-08 03:42:32.604; [ ] org.apache.solr.servlet.HttpSolrCall; [admin] webapp=null path=/admin/metrics params={wt=javabin&version=2&key=solr.jvm:os.processCpuLoad&key=solr.node:CONTAINER.fs.coreRoot.usableSpace&key=solr.jvm:os.systemLoadAverage&key=solr.jvm:memory.heap.used} status=0 QTime=0
      INFO - 2018-06-08 03:42:41.525; [ ] org.apache.solr.servlet.HttpSolrCall; [admin] webapp=null path=/admin/metrics params={prefix=CONTAINER.fs.usableSpace,CORE.coreName&wt=javabin&version=2&group=solr.node,solr.core} status=0 QTime=0
      INFO - 2018-06-08 03:42:41.559; [ ] org.apache.solr.cloud.autoscaling.ComputePlanAction; Computed Plan: action=ADDREPLICA&collection=test_rate_trigger&shard=shard1&node=127.94.0.1:8984_solr&type=NRT
      INFO - 2018-06-08 03:42:41.560; [ ] org.apache.solr.cloud.autoscaling.ComputePlanAction; Computed Plan: action=ADDREPLICA&collection=test_rate_trigger&shard=shard1&node=127.94.0.1:8983_solr&type=NRT
      INFO - 2018-06-08 03:42:41.560; [ ] org.apache.solr.cloud.autoscaling.ComputePlanAction; Computed Plan: action=ADDREPLICA&collection=test_rate_trigger&shard=shard1&node=127.94.0.1:8983_solr&type=NRT
      INFO - 2018-06-08 03:42:41.561; [ ] org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper; returnSession, curr-time 9244959 sessionWrapper.createTime 9244956725861, this.sessionWrapper.createTime 9244956725861
      INFO - 2018-06-08 03:43:32.622; [ ] org.apache.solr.servlet.HttpSolrCall; [admin] webapp=null path=/admin/metrics params={wt=javabin&version=2&key=solr.jvm:os.processCpuLoad&key=solr.node:CONTAINER.fs.coreRoot.usableSpace&key=solr.jvm:os.systemLoadAverage&key=solr.jvm:memory.heap.used} status=0 QTime=0
      INFO - 2018-06-08 03:43:42.158; [ ] org.apache.solr.servlet.HttpSolrCall; [admin] webapp=null path=/admin/metrics params={prefix=CONTAINER.fs.usableSpace,CORE.coreName&wt=javabin&version=2&group=solr.node,solr.core} status=0 QTime=1
      INFO - 2018-06-08 03:43:42.178; [ ] org.apache.solr.cloud.autoscaling.ComputePlanAction; Computed Plan: action=ADDREPLICA&collection=test_rate_trigger&shard=shard1&node=127.94.0.1:8984_solr&type=NRT
      INFO - 2018-06-08 03:43:42.180; [ ] org.apache.solr.cloud.autoscaling.ComputePlanAction; Computed Plan: action=ADDREPLICA&collection=test_rate_trigger&shard=shard1&node=127.94.0.1:8983_solr&type=NRT
      INFO - 2018-06-08 03:43:42.181; [ ] org.apache.solr.cloud.autoscaling.ComputePlanAction; Computed Plan: action=ADDREPLICA&collection=test_rate_trigger&shard=shard1&node=127.94.0.1:8983_solr&type=NRT
      INFO - 2018-06-08 03:43:42.181; [ ] org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper; returnSession, curr-time 9305581 sessionWrapper.createTime 9305577119413, this.sessionWrapper.createTime 9305577119413
      INFO - 2018-06-08 03:44:32.642; [ ] org.apache.solr.servlet.HttpSolrCall; [admin] webapp=null path=/admin/metrics params={wt=javabin&version=2&key=solr.jvm:os.processCpuLoad&key=solr.node:CONTAINER.fs.coreRoot.usableSpace&key=solr.jvm:os.systemLoadAverage&key=solr.jvm:memory.heap.used} status=0 QTime=0
      INFO - 2018-06-08 03:44:42.759; [ ] org.apache.solr.servlet.HttpSolrCall; [admin] webapp=null path=/admin/metrics params={prefix=CONTAINER.fs.usableSpace,CORE.coreName&wt=javabin&version=2&group=solr.node,solr.core} status=0 QTime=0
      INFO - 2018-06-08 03:44:42.778; [ ] org.apache.solr.cloud.autoscaling.ComputePlanAction; Computed Plan: action=ADDREPLICA&collection=test_rate_trigger&shard=shard1&node=127.94.0.1:8984_solr&type=NRT
      INFO - 2018-06-08 03:44:42.779; [ ] org.apache.solr.cloud.autoscaling.ComputePlanAction; Computed Plan: action=ADDREPLICA&collection=test_rate_trigger&shard=shard1&node=127.94.0.1:8983_solr&type=NRT
      INFO - 2018-06-08 03:44:42.779; [ ] org.apache.solr.cloud.autoscaling.ComputePlanAction; Computed Plan: action=ADDREPLICA&collection=test_rate_trigger&shard=shard1&node=127.94.0.1:8983_solr&type=NRT
      INFO - 2018-06-08 03:44:42.779; [ ] org.apache.solr.client.solrj.cloud.autoscaling.PolicyHelper; returnSession, curr-time 9366182 sessionWrapper.createTime 9366178796748, this.sessionWrapper.createTime 9366178796748

       

      Thirdly, for this above mentioned test I started observing the .system collection . Here is a query that I used to get the documents created from the first time the listener kicked in

      http://localhost:8983/solr/.system/select?fq=source_s:SystemLogListener&q=*:*&rows=4&sort=timestamp%20asc

      My first expectation was that I'd see 3 docs but I saw 4 docs. Curious why it's 4 ( the docs are attached as 4_docs.json )

      My intention here is to remove the system log listener with an http listener here I wanted to understand should I be looking out for 4 events or 3
      The first reaction here is it's a minor bug hence I'm putting it as part of this jira

      Happy to break it up into smaller Jiras once I hear back if these are valid issues. This test was run against master

        Attachments

        1. bug_report.txt
          8 kB
          Varun Thacker
        2. 4_docs.json
          13 kB
          Varun Thacker
        3. graph_view.png
          65 kB
          Varun Thacker
        4. multiple_replicas.zip
          192 kB
          Varun Thacker
        5. system_docs.json
          251 kB
          Varun Thacker

          Activity

            People

            • Assignee:
              ab Andrzej Bialecki
              Reporter:
              varunthacker Varun Thacker
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: