Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-522 [Umbrella] Better reporting for crashed/Killed AMs and Containers
  3. YARN-4309

Add container launch related debug information to container logs when a container fails

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.8.0, 3.0.0-alpha1
    • nodemanager
    • None
    • Reviewed

    Description

      Sometimes when a container fails, it can be pretty hard to figure out why it failed.

      My proposal is that if a container fails, we collect information about the container local dir and dump it into the container log dir. Ideally, I'd like to tar up the directory entirely, but I'm not sure of the security and space implications of such a approach. At the very least, we can list all the files in the container local dir, and dump the contents of launch_container.sh(into the container log dir).

      When log aggregation occurs, all this information will automatically get collected and make debugging such failures much easier.

      Attachments

        1. YARN-4309.010.patch
          20 kB
          Varun Vasudev
        2. YARN-4309.009.patch
          20 kB
          Varun Vasudev
        3. YARN-4309.008.patch
          19 kB
          Varun Vasudev
        4. YARN-4309.007.patch
          19 kB
          Varun Vasudev
        5. YARN-4309.006.patch
          19 kB
          Varun Vasudev
        6. YARN-4309.005.patch
          19 kB
          Varun Vasudev
        7. YARN-4309.004.patch
          18 kB
          Varun Vasudev
        8. YARN-4309.003.patch
          18 kB
          Varun Vasudev
        9. YARN-4309.002.patch
          17 kB
          Varun Vasudev
        10. YARN-4309.001.patch
          15 kB
          Varun Vasudev

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            vvasudev Varun Vasudev
            vvasudev Varun Vasudev
            Votes:
            0 Vote for this issue
            Watchers:
            17 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment