Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-9735

Umbrella JIRA for Auto Scaling and Cluster Management in SolrCloud

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 7.4
    • Component/s: AutoScaling
    • Labels:
      None

      Description

      As SolrCloud is now used at fairly large scale, most users end up writing their own cluster management tools. We should have a framework for cluster management in Solr.
      In a discussion with Noble Paul, we outlined the following steps w.r.t. the approach to having this implemented:

      • Basic API calls for cluster management e.g. utilize added nodes, remove a node etc. These calls would need explicit invocation by the users to begin with. It would also specify the strategy to use. For instance I can have a strategy called optimizeCoreCount which would target to have an even no:of cores in each node . The strategy could optionally take parameters as well
      • Metrics and stats tracking e.g. qps, etc. These would be required for any advanced cluster management tasks e.g. maintain a qps of 'x' by auto-adding a replica (using a recipe) etc. We would need collection/shard/node level views of metrics for this.
      • Recipes: combination of multiple sequential/parallel API calls based on rules. This would be complicated specially as most of these would be long running series of tasks which would either have to be rolled back or resumed in case of a failure.
      • Event based triggers that would not require explicit cluster management calls for end users.

        Attachments

          Issue Links

          1.
          An UTILIZENODE command Sub-task Closed Noble Paul
          2.
          MOVEREPLICA API Sub-task Closed Cao Manh Dat
          3.
          A new DSL to set cluster-wide preferences on how to allocate replicas to nodes Sub-task Resolved Noble Paul
          4.
          Implement set-trigger and remove-trigger APIs Sub-task Closed Shalin Shekhar Mangar
          5.
          Implement set-listener and remove-listener API Sub-task Closed Shalin Shekhar Mangar
          6.
          Implement suspend-trigger and resume-trigger APIs Sub-task Closed Shalin Shekhar Mangar
          7.
          Implement read API for autoscaling configuration Sub-task Resolved Shalin Shekhar Mangar
          8.
          Implement set-policy and remove-policy APIs Sub-task Resolved Cao Manh Dat
          9.
          Implement trigger for nodeAdded event Sub-task Closed Shalin Shekhar Mangar
          10.
          Implement trigger support for nodeLost event type Sub-task Closed Cao Manh Dat
          11.
          Port 'autoAddReplicas' feature to the autoscaling framework and make it work with non-shared filesystems Sub-task Closed Cao Manh Dat
          12.
          All collection APIs should use the new Policy framework for replica placement Sub-task Resolved Noble Paul
          13.
          Implement ComputePlanAction for autoscaling Sub-task Closed Shalin Shekhar Mangar
          14.
          Each trigger fire event should be assigned a unique id Sub-task Resolved Andrzej Bialecki
          15.
          Persist intermediate trigger state in ZK to continue tracking information across overseer restarts Sub-task Resolved Andrzej Bialecki
          16.
          Triggers should be able to restore state from old instances when taking over Sub-task Closed Shalin Shekhar Mangar
          17.
          Minor suspend-trigger and resume-trigger API improvements Sub-task Resolved Andrzej Bialecki
          18.
          Throttling strategy for triggers and policy executions Sub-task Closed Shalin Shekhar Mangar
          19.
          Expose a diagnostics API to return nodes sorted by load in descending order and any policy violations Sub-task Resolved Shalin Shekhar Mangar
          20.
          OverseerTriggerThread does not start triggers on overseer start until autoscaling config watcher is fired Sub-task Closed Shalin Shekhar Mangar
          21.
          TriggerAction is initialised even if the trigger is never scheduled Sub-task Closed Shalin Shekhar Mangar
          22.
          Reliably create nodeAdded / nodeLost events Sub-task Resolved Andrzej Bialecki
          23.
          AutoScalingHandler should validate policy and preferences before updating zookeeper Sub-task Closed Shalin Shekhar Mangar
          24.
          Allow nodeAdded / nodeLost events to report multiple nodes in one event. Sub-task Closed Andrzej Bialecki
          25.
          Improve error handling and tests for Snitch and subclasses Sub-task Resolved Shalin Shekhar Mangar
          26.
          Add support for different replica types in the new policy framework Sub-task Closed Noble Paul
          27.
          Write documentation for the autoscaling APIs and policy/preferences syntax for Solr 7.0 Sub-task Closed Shalin Shekhar Mangar
          28.
          Concurrent execution of Policy computations should yield correct result Sub-task Closed Noble Paul
          29.
          Resolve conflicting package names o.a.s.cloud.autoscaling Sub-task Resolved Ishan Chattopadhyaya
          30.
          Implement ExecutePlanAction for autoscaling Sub-task Closed Shalin Shekhar Mangar
          31.
          CREATE & CREATESHARD to support replica types when using policy Sub-task Closed Noble Paul
          32.
          Implement TriggerListener API Sub-task Closed Andrzej Bialecki
          33.
          Changes made via AutoScalingHandler should be atomic Sub-task Resolved Andrzej Bialecki
          34.
          Policy can suggest more operations than necessary Sub-task Closed Noble Paul
          35.
          Implement LogPlanAction for autoscaling Sub-task Closed Andrzej Bialecki
          36.
          Fix AutoScalingSnitch's use of usableSpace metrics to account for solr.data.home and dataDir in solrconfig.xml Sub-task Closed Unassigned
          37.
          Use disk free metric in default cluster preferences Sub-task Closed Noble Paul
          38.
          Add support for spins metric in Policy Sub-task Closed Noble Paul
          39.
          Policy should accept disk space as a hint Sub-task Closed Noble Paul
          40.
          Collection APIs should provide disk space hint to Policy when possible Sub-task Closed Noble Paul
          41.
          Implement a scheduled trigger Sub-task Closed Shalin Shekhar Mangar
          42.
          REPLACENODE should make it optional to provide a target node Sub-task Closed Noble Paul
          43.
          Implement trigger for searchRate event type Sub-task Closed Andrzej Bialecki
          44.
          New /autoscaling/history API to return past cluster events and actions Sub-task Closed Andrzej Bialecki
          45.
          Unused field Row.violations Sub-task Closed Noble Paul
          46.
          Improve resiliency of autoscaling actions Sub-task Closed Shalin Shekhar Mangar
          47.
          remove-policy must fail if a policy to be deleted is used by a collection Sub-task Closed Noble Paul
          48.
          Write documentation for autoscaling APIs, triggers, actions, listeners for Solr 7.1 Sub-task Closed Shalin Shekhar Mangar
          49.
          Change error handling in AutoScalingHandler to be consistent w/ other APIs Sub-task Resolved Noble Paul
          50.
          Implement trigger for arbitrary metrics Sub-task Closed Shalin Shekhar Mangar
          51.
          Implement a set-property command for AutoScaling API Sub-task Closed Shalin Shekhar Mangar
          52.
          Make arbitrary metrics values available for policies Sub-task Resolved Noble Paul
          53.
          TriggerListener registration bug Sub-task Closed Andrzej Bialecki
          54.
          Lock autoscaling triggers when changes they requested are being made Sub-task Closed Andrzej Bialecki
          55.
          AddReplicaSuggester should support collection+shard hints Sub-task Closed Noble Paul
          56.
          MODIFYCOLLECTION should be able to edit policy attribute Sub-task Resolved Noble Paul
          57.
          ComputePlanAction should accept configuration to compute plans only for specific collections Sub-task Closed Andrzej Bialecki
          58.
          Implement an option in collection commands to wait for command results Sub-task Closed Andrzej Bialecki
          59.
          Implement a periodic house-keeping task Sub-task Closed Andrzej Bialecki
          60.
          Pause triggers until actions finish executing and the cool down period expires Sub-task Closed Shalin Shekhar Mangar
          61.
          Remove action throttle Sub-task Closed Shalin Shekhar Mangar
          62.
          Allow searchRate trigger to delete replicas Sub-task Closed Andrzej Bialecki
          63.
          MOVEREPLICA suggester should not suggest the leader to be moved if there are other replicas Sub-task Closed Noble Paul
          64.
          forbid multiple COLL_SHARD hints Sub-task Resolved Noble Paul
          65.
          Refactor Policy framework to let state changes to be applied to all nodes Sub-task Closed Noble Paul
          66.
          AutoScalingHandler should validate triggers before updating zookeeper Sub-task Closed Andrzej Bialecki
          67.
          Add trigger based on document count Sub-task Closed Andrzej Bialecki
          68.
          Response /autoscaling/diagnostics shows improper json Sub-task Closed Noble Paul

            Activity

              People

              • Assignee:
                shalin Shalin Shekhar Mangar
                Reporter:
                anshum Anshum Gupta
              • Votes:
                5 Vote for this issue
                Watchers:
                27 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - 1,344h
                  1,344h
                  Remaining:
                  Remaining Estimate - 1,344h
                  1,344h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified