Uploaded image for project: 'Aurora'
  1. Aurora
  2. AURORA-1918

allow resource monitoring to be disabled in the executor

    Details

    • Type: Task
    • Status: Reviewable
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Executor
    • Labels:
      None

      Description

      The Aurora executor monitors a task's resource usage (CPU, memory and disk) and kills it if its disk usage exceeds its reservation.

      Monitoring disk usage is expensive, the executor does the equivalent of running 'du' inside a container sandbox; it recursively walks the sandbox to calculate usage and in doing so effectively trashes the page cache. Within Twitter we've seen the executor consume an entire core while calculating disk usage – a container with 500k files can reproduce the problem.

      The executor also calculates process metrics, but the metrics are never used.

      Mesos has a posix disk isolator (and XFS isolator) which provides the same functionality: it monitors disk usage and terminates a task if it exceeds its reservation.

      Thermos Observer also monitors resource usage (see AURORA-1917), so disk usage is typically calculated 3 times – once each by the executor, the observer, and mesos.

      This could be solved by adding --task_process_collection_interval_secs and --task_disk_collection_interval_secs flags to the executor, and if a zero interval is specified disabling resource collection.

        Activity

        Show
        drobinson David Robinson added a comment - https://reviews.apache.org/r/58366/

          People

          • Assignee:
            rezam Reza Motamedi
            Reporter:
            drobinson David Robinson
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:

              Development