Type: New Feature
Affects Version/s: None
Fix Version/s: None
From the mailing list:
Add an elastic scaling monitor and coordinator, i.e. a whirr process that would be running on some or all of the nodes that:
- would collect load metrics (both generic and specific to each application)
- would feed them through an elastic decision making engine (also specific to each application as it depends on the specific metrics)
- would then act on those decisions by either expanding or contracting the cluster.
- it must not be completely distributed, i.e. it can have a specific assigned node that will monitor/coordinate but this node must not be fixed, i.e. it could/should change if the previous coordinator leaves the cluster.
- each application would define the set of metrics that it emits and use a local monitor process to feed them to the coordinator.
- the monitor process should emit some standard metrics (Disk I/O, CPU Load, Net I/O, memory)
- the coordinator would have a pluggable decision engine policy also defined by the application that would consume metrics and make a decision.
- whirr would take care of requesting/releasing nodes and adding/removing them from the relevant services.
Some implementation ideas:
- it could tun on top of zookeeper. zk is already a requirement for several services and would allow to reliably store coordinator state so that another node can pickup if the previous coordinator leaves the cluster.
- it could use Avro to serialize/deserialize metrics data
- it should be optional, i.e. simply another service that the whirr cli starts
- it would also be nice to have a monitor/coordinator web page that would display metrics and view cluster status in an aggregated view.