Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2037

Capturing interim progress times, CPU usage, and memory usage, when tasks reach certain progress thresholds

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 0.23.0
    • None
    • None
    • Hide
      Capture intermediate task resource consumption information:
      * Time taken so far
      * CPU load [either at the time the data are taken, or exponentially smoothed]
      * Memory load [also either at the time the data are taken, or exponentially smoothed]

      This would be taken at intervals that depend on the task progress plateaus. For example, reducers have three progress ranges - [0-1/3], (1/3-2/3], and (2/3-3/3] - where fundamentally different activities happen. Mappers have different boundaries that are not symmetrically placed [0-9/10], (9/10-1]. Data capture boundaries should coincide with activity boundaries. For the state information capture [CPU and memory] we should average over the covered interval.
      Show
      Capture intermediate task resource consumption information: * Time taken so far * CPU load [either at the time the data are taken, or exponentially smoothed] * Memory load [also either at the time the data are taken, or exponentially smoothed] This would be taken at intervals that depend on the task progress plateaus. For example, reducers have three progress ranges - [0-1/3], (1/3-2/3], and (2/3-3/3] - where fundamentally different activities happen. Mappers have different boundaries that are not symmetrically placed [0-9/10], (9/10-1]. Data capture boundaries should coincide with activity boundaries. For the state information capture [CPU and memory] we should average over the covered interval.

    Description

      We would like to capture the following information at certain progress thresholds as a task runs:

      • Time taken so far
      • CPU load [either at the time the data are taken, or exponentially smoothed]
      • Memory load [also either at the time the data are taken, or exponentially smoothed]

      This would be taken at intervals that depend on the task progress plateaus. For example, reducers have three progress ranges – [0-1/3], (1/3-2/3], and (2/3-3/3] – where fundamentally different activities happen. Mappers have different boundaries, I understand, that are not symmetrically placed. Data capture boundaries should coincide with activity boundaries. For the state information capture [CPU and memory] we should average over the covered interval.

      This data would flow in with the heartbeats. It would be placed in the job history as part of the task attempt completion event, so it could be processed by rumen or some similar tool and could drive a benchmark engine.

      Attachments

        1. MAPREDUCE-2037.patch
          82 kB
          Arun Murthy
        2. MAPREDUCE-2037.patch
          90 kB
          Arun Murthy

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            dking Dick King
            dking Dick King
            Votes:
            1 Vote for this issue
            Watchers:
            16 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment