Uploaded image for project: 'Apache Whirr (retired)'
  1. Apache Whirr (retired)
  2. WHIRR-238

Scaling Monitor/Coordinator

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • core
    • None

    Description

      From the mailing list:

      General idea:
      Add an elastic scaling monitor and coordinator, i.e. a whirr process that would be running on some or all of the nodes that:

      • would collect load metrics (both generic and specific to each application)
      • would feed them through an elastic decision making engine (also specific to each application as it depends on the specific metrics)
      • would then act on those decisions by either expanding or contracting the cluster.

      Some specifics:

      • it must not be completely distributed, i.e. it can have a specific assigned node that will monitor/coordinate but this node must not be fixed, i.e. it could/should change if the previous coordinator leaves the cluster.
      • each application would define the set of metrics that it emits and use a local monitor process to feed them to the coordinator.
      • the monitor process should emit some standard metrics (Disk I/O, CPU Load, Net I/O, memory)
      • the coordinator would have a pluggable decision engine policy also defined by the application that would consume metrics and make a decision.
      • whirr would take care of requesting/releasing nodes and adding/removing them from the relevant services.

      Some implementation ideas:

      • it could tun on top of zookeeper. zk is already a requirement for several services and would allow to reliably store coordinator state so that another node can pickup if the previous coordinator leaves the cluster.
      • it could use Avro to serialize/deserialize metrics data
      • it should be optional, i.e. simply another service that the whirr cli starts
      • it would also be nice to have a monitor/coordinator web page that would display metrics and view cluster status in an aggregated view.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              dr-alves David Alves
              Votes:
              2 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: