Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-4791

Per user blacklist node for user specific error for container launch failure.

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • None
    • None
    • applications
    • None

    Description

      There are some user specific error for container launch failure, like:
      when enabling LinuxContainerExecutor, but some node doesn't have such user exists, so container launch should get failed with following information:

      2016-02-14 15:37:03,111 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: 
      appattempt_1434045496283_0036_000002 State change from LAUNCHED to FAILED 
      2016-02-14 15:37:03,111 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Application application_1434045496283_0036 failed 2 times due to AM Container for 
      appattempt_1434045496283_0036_000002 exited with exitCode: -1000 due to: 
      Application application_1434045496283_0036 initialization failed (exitCode=255) with output: User jdu not found 
      

      Obviously, this node is not suitable for launching container for this user's other applications. We need a per user blacklist track mechanism rather than per application now.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            junping_du Junping Du
            junping_du Junping Du
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment