Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-6489

Fail fast rogue tasks that write too much to local disk

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.7.1
    • 2.8.0, 3.0.0-alpha1
    • task
    • None
    • Reviewed

    Description

      Tasks of the rogue jobs can write too much to local disk, negatively affecting the jobs running in collocated containers. Ideally YARN will be able to limit amount of local disk used by each task: YARN-4011. Until then, the mapreduce task can fail fast if the task is writing too much (above a configured threshold) to local disk.

      As we discussed here the suggested approach is that the MapReduce task checks for BYTES_WRITTEN counter for the local disk and throws an exception when it goes beyond a configured value. It is true that written bytes is larger than the actual used disk space, but to detect a rogue task the exact value is not required and a very large value for written bytes to local disk is a good indicative that the task is misbehaving.

      Attachments

        1. MAPREDUCE-6489-branch-2.003.patch
          12 kB
          Maysam Yabandeh
        2. MAPREDUCE-6489.003.patch
          12 kB
          Maysam Yabandeh
        3. MAPREDUCE-6489.002.patch
          10 kB
          Maysam Yabandeh
        4. MAPREDUCE-6489.001.patch
          9 kB
          Maysam Yabandeh

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            maysamyabandeh Maysam Yabandeh
            maysamyabandeh Maysam Yabandeh
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment