Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-6771

RMContainerAllocator sends container diagnostics event after corresponding completion event

VotersStop watchingWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.7.3
    • 2.8.0, 2.7.4, 3.0.0-alpha2
    • mrv2
    • None

    Description

      Task containers can go over their resource limit, and killed by Node Manager. Then MR AM gets notified of the container status and diagnostics information through its heartbeat with RM. However, it is possible that the diagnostics information never gets into .jhist file, so when the job completes, the diagnostics information associated with the failed task attempts is empty. This makes it hard for users to root cause job failures that are often caused by memory leak.

      Attachments

        1. mapreduce6771.001.patch
          1 kB
          Haibo Chen
        2. mapreduce6771.002.patch
          8 kB
          Haibo Chen
        3. mapreduce6771.003.patch
          8 kB
          Haibo Chen
        4. mapreduce6771.004.patch
          8 kB
          Haibo Chen
        5. mapreduce6771.branch-2.8.patch
          8 kB
          Haibo Chen
        6. TaUnsuccessfullyEventEmission.jpg
          1.69 MB
          Haibo Chen

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            haibochen Haibo Chen
            haibochen Haibo Chen
            Votes:
            0 Vote for this issue
            Watchers:
            5 Stop watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment