Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-9735

Umbrella JIRA for Auto Scaling and Cluster Management in SolrCloud

Agile BoardAttach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 7.4
    • Component/s: AutoScaling
    • Labels:
      None

      Description

      As SolrCloud is now used at fairly large scale, most users end up writing their own cluster management tools. We should have a framework for cluster management in Solr.
      In a discussion with Noble Paul, we outlined the following steps w.r.t. the approach to having this implemented:

      • Basic API calls for cluster management e.g. utilize added nodes, remove a node etc. These calls would need explicit invocation by the users to begin with. It would also specify the strategy to use. For instance I can have a strategy called optimizeCoreCount which would target to have an even no:of cores in each node . The strategy could optionally take parameters as well
      • Metrics and stats tracking e.g. qps, etc. These would be required for any advanced cluster management tasks e.g. maintain a qps of 'x' by auto-adding a replica (using a recipe) etc. We would need collection/shard/node level views of metrics for this.
      • Recipes: combination of multiple sequential/parallel API calls based on rules. This would be complicated specially as most of these would be long running series of tasks which would either have to be rolled back or resumed in case of a failure.
      • Event based triggers that would not require explicit cluster management calls for end users.

        Attachments

        Issue Links

        1.
        An UTILIZENODE command Sub-task Closed Noble Paul Actions
        2.
        MOVEREPLICA API Sub-task Closed Cao Manh Dat Actions
        3.
        A new DSL to set cluster-wide preferences on how to allocate replicas to nodes Sub-task Resolved Noble Paul Actions
        4.
        Implement set-trigger and remove-trigger APIs Sub-task Closed Shalin Shekhar Mangar Actions
        5.
        Implement set-listener and remove-listener API Sub-task Closed Shalin Shekhar Mangar Actions
        6.
        Implement suspend-trigger and resume-trigger APIs Sub-task Closed Shalin Shekhar Mangar Actions
        7.
        Implement read API for autoscaling configuration Sub-task Resolved Shalin Shekhar Mangar Actions
        8.
        Implement set-policy and remove-policy APIs Sub-task Resolved Cao Manh Dat Actions
        9.
        Implement trigger for nodeAdded event Sub-task Closed Shalin Shekhar Mangar Actions
        10.
        Implement trigger support for nodeLost event type Sub-task Closed Cao Manh Dat Actions
        11.
        Port 'autoAddReplicas' feature to the autoscaling framework and make it work with non-shared filesystems Sub-task Closed Cao Manh Dat Actions
        12.
        All collection APIs should use the new Policy framework for replica placement Sub-task Resolved Noble Paul Actions
        13.
        Implement ComputePlanAction for autoscaling Sub-task Closed Shalin Shekhar Mangar Actions
        14.
        Each trigger fire event should be assigned a unique id Sub-task Resolved Andrzej Bialecki Actions
        15.
        Persist intermediate trigger state in ZK to continue tracking information across overseer restarts Sub-task Resolved Andrzej Bialecki Actions
        16.
        Triggers should be able to restore state from old instances when taking over Sub-task Closed Shalin Shekhar Mangar Actions
        17.
        Minor suspend-trigger and resume-trigger API improvements Sub-task Resolved Andrzej Bialecki Actions
        18.
        Throttling strategy for triggers and policy executions Sub-task Closed Shalin Shekhar Mangar Actions
        19.
        Expose a diagnostics API to return nodes sorted by load in descending order and any policy violations Sub-task Resolved Shalin Shekhar Mangar Actions
        20.
        OverseerTriggerThread does not start triggers on overseer start until autoscaling config watcher is fired Sub-task Closed Shalin Shekhar Mangar Actions
        21.
        TriggerAction is initialised even if the trigger is never scheduled Sub-task Closed Shalin Shekhar Mangar Actions
        22.
        Reliably create nodeAdded / nodeLost events Sub-task Resolved Andrzej Bialecki Actions
        23.
        AutoScalingHandler should validate policy and preferences before updating zookeeper Sub-task Closed Shalin Shekhar Mangar Actions
        24.
        Allow nodeAdded / nodeLost events to report multiple nodes in one event. Sub-task Closed Andrzej Bialecki Actions
        25.
        Improve error handling and tests for Snitch and subclasses Sub-task Resolved Shalin Shekhar Mangar Actions
        26.
        Add support for different replica types in the new policy framework Sub-task Closed Noble Paul Actions
        27.
        Write documentation for the autoscaling APIs and policy/preferences syntax for Solr 7.0 Sub-task Closed Shalin Shekhar Mangar Actions
        28.
        Concurrent execution of Policy computations should yield correct result Sub-task Closed Noble Paul Actions
        29.
        Resolve conflicting package names o.a.s.cloud.autoscaling Sub-task Resolved Ishan Chattopadhyaya Actions
        30.
        Implement ExecutePlanAction for autoscaling Sub-task Closed Shalin Shekhar Mangar Actions
        31.
        CREATE & CREATESHARD to support replica types when using policy Sub-task Closed Noble Paul Actions
        32.
        Implement TriggerListener API Sub-task Closed Andrzej Bialecki Actions
        33.
        Changes made via AutoScalingHandler should be atomic Sub-task Resolved Andrzej Bialecki Actions
        34.
        Policy can suggest more operations than necessary Sub-task Closed Noble Paul Actions
        35.
        Implement LogPlanAction for autoscaling Sub-task Closed Andrzej Bialecki Actions
        36.
        Fix AutoScalingSnitch's use of usableSpace metrics to account for solr.data.home and dataDir in solrconfig.xml Sub-task Closed Unassigned Actions
        37.
        Use disk free metric in default cluster preferences Sub-task Closed Noble Paul Actions
        38.
        Add support for spins metric in Policy Sub-task Closed Noble Paul Actions
        39.
        Policy should accept disk space as a hint Sub-task Closed Noble Paul Actions
        40.
        Collection APIs should provide disk space hint to Policy when possible Sub-task Closed Noble Paul Actions
        41.
        Implement a scheduled trigger Sub-task Closed Shalin Shekhar Mangar Actions
        42.
        REPLACENODE should make it optional to provide a target node Sub-task Closed Noble Paul Actions
        43.
        Implement trigger for searchRate event type Sub-task Closed Andrzej Bialecki Actions
        44.
        New /autoscaling/history API to return past cluster events and actions Sub-task Closed Andrzej Bialecki Actions
        45.
        Unused field Row.violations Sub-task Closed Noble Paul Actions
        46.
        Improve resiliency of autoscaling actions Sub-task Closed Shalin Shekhar Mangar Actions
        47.
        remove-policy must fail if a policy to be deleted is used by a collection Sub-task Closed Noble Paul Actions
        48.
        Write documentation for autoscaling APIs, triggers, actions, listeners for Solr 7.1 Sub-task Closed Shalin Shekhar Mangar Actions
        49.
        Change error handling in AutoScalingHandler to be consistent w/ other APIs Sub-task Resolved Noble Paul Actions
        50.
        Implement trigger for arbitrary metrics Sub-task Closed Shalin Shekhar Mangar Actions
        51.
        Implement a set-property command for AutoScaling API Sub-task Closed Shalin Shekhar Mangar Actions
        52.
        Make arbitrary metrics values available for policies Sub-task Resolved Noble Paul Actions
        53.
        TriggerListener registration bug Sub-task Closed Andrzej Bialecki Actions
        54.
        Lock autoscaling triggers when changes they requested are being made Sub-task Closed Andrzej Bialecki Actions
        55.
        AddReplicaSuggester should support collection+shard hints Sub-task Closed Noble Paul Actions
        56.
        MODIFYCOLLECTION should be able to edit policy attribute Sub-task Resolved Noble Paul Actions
        57.
        ComputePlanAction should accept configuration to compute plans only for specific collections Sub-task Closed Andrzej Bialecki Actions
        58.
        Implement an option in collection commands to wait for command results Sub-task Closed Andrzej Bialecki Actions
        59.
        Implement a periodic house-keeping task Sub-task Closed Andrzej Bialecki Actions
        60.
        Pause triggers until actions finish executing and the cool down period expires Sub-task Closed Shalin Shekhar Mangar Actions
        61.
        Remove action throttle Sub-task Closed Shalin Shekhar Mangar Actions
        62.
        Allow searchRate trigger to delete replicas Sub-task Closed Andrzej Bialecki Actions
        63.
        MOVEREPLICA suggester should not suggest the leader to be moved if there are other replicas Sub-task Closed Noble Paul Actions
        64.
        forbid multiple COLL_SHARD hints Sub-task Resolved Noble Paul Actions
        65.
        Refactor Policy framework to let state changes to be applied to all nodes Sub-task Closed Noble Paul Actions
        66.
        AutoScalingHandler should validate triggers before updating zookeeper Sub-task Closed Andrzej Bialecki Actions
        67.
        Add trigger based on document count Sub-task Closed Andrzej Bialecki Actions
        68.
        Response /autoscaling/diagnostics shows improper json Sub-task Closed Noble Paul Actions

          Activity

            People

            • Assignee:
              shalin Shalin Shekhar Mangar
              Reporter:
              anshum Anshum Gupta

              Dates

              • Created:
                Updated:
                Resolved:

              Time Tracking

              Estimated:
              Original Estimate - 1,344h
              1,344h
              Remaining:
              Remaining Estimate - 1,344h
              1,344h
              Logged:
              Time Spent - Not Specified
              Not Specified

                Issue deployment