Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-4946

RM should not consider an application as COMPLETED when log aggregation is not in a terminal state

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.8.0
    • 3.2.0
    • log-aggregation
    • None
    • Reviewed

    Description

      MAPREDUCE-6415 added a tool that combines the aggregated log files for each Yarn App into a HAR file. When run, it seeds the list by looking at the aggregated logs directory, and then filters out ineligible apps. One of the criteria involves checking with the RM that an Application's log aggregation status is not still running and has not failed. When the RM "forgets" about an older completed Application (e.g. RM failover, enough time has passed, etc), the tool won't find the Application in the RM and will just assume that its log aggregation succeeded, even if it actually failed or is still running.

      We can solve this problem by doing the following:
      The RM should not consider an app to be fully completed (and thus removed from its history) until the aggregation status has reached a terminal state (e.g. SUCCEEDED, FAILED, TIME_OUT).

      Attachments

        1. YARN-4946.001.patch
          19 kB
          Szilard Nemeth
        2. YARN-4946.002.patch
          19 kB
          Szilard Nemeth
        3. YARN-4946.003.patch
          45 kB
          Szilard Nemeth
        4. YARN-4946.004.patch
          27 kB
          Szilard Nemeth

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            snemeth Szilard Nemeth Assign to me
            rkanter Robert Kanter
            Votes:
            0 Vote for this issue
            Watchers:
            14 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment