Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-10516

In HA mode, when one Resource Manager has networking issue, getTokenService() should not throw runtime exception

Add voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Patch Available
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • client
    • None

    Description

      We have observed one issue from YARN client around this piece of code:

      https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ClientRMProxy.java#L145

       

      While 

      services.add(SecurityUtil.buildTokenService( yarnConf.getSocketAddr(address, defaultAddr, defaultPort)) .toString());
       
      

      is being called,    buildTokenService()  fails and will throw runtime exception, more specifically, UnknownHostException from here: https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SecurityUtil.java#L466
      while one of the RM host was having networking issue that IP cannot be resolved.

      This runtime exception then floats all the way up to our application and causes MR job submission failed. 

      In my opinion, since we have HA here, multiple RMs are still alive and available. We should catch this exception in  getTokenService() and handle it properly, instead of failing the whole action. 

       

       

      Would like to hear your opinion on this, if agreed, I will provide a patch on this. Thank you.

      Attachments

        1. YARN-10516.007.patch
          3 kB
          Xu Cang
        2. YARN-10516.004.patch
          3 kB
          Xu Cang
        3. YARN-10516.003.patch
          2 kB
          Xu Cang
        4. YARN-10516.002.patch
          2 kB
          Xu Cang
        5. YARN-10516.001.patch
          2 kB
          Xu Cang

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            xucang Xu Cang

            Dates

              Created:
              Updated:

              Slack

                Issue deployment