Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-10516

In HA mode, when one Resource Manager has networking issue, getTokenService() should not throw runtime exception

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Patch Available
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: client
    • Labels:
      None

      Description

      We have observed one issue from YARN client around this piece of code:

      https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ClientRMProxy.java#L145

       

      While 

      services.add(SecurityUtil.buildTokenService( yarnConf.getSocketAddr(address, defaultAddr, defaultPort)) .toString());
       
      

      is being called,    buildTokenService()  fails and will throw runtime exception, more specifically, UnknownHostException from here: https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SecurityUtil.java#L466
      while one of the RM host was having networking issue that IP cannot be resolved.

      This runtime exception then floats all the way up to our application and causes MR job submission failed. 

      In my opinion, since we have HA here, multiple RMs are still alive and available. We should catch this exception in  getTokenService() and handle it properly, instead of failing the whole action. 

       

       

      Would like to hear your opinion on this, if agreed, I will provide a patch on this. Thank you.

        Attachments

        1. YARN-10516.001.patch
          2 kB
          Xu Cang
        2. YARN-10516.002.patch
          2 kB
          Xu Cang
        3. YARN-10516.003.patch
          2 kB
          Xu Cang
        4. YARN-10516.004.patch
          3 kB
          Xu Cang
        5. YARN-10516.007.patch
          3 kB
          Xu Cang

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              xucang Xu Cang
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated: