Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-10479

RMProxy should retry on SocketTimeout Exceptions

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.10.1, 3.4.0
    • Fix Version/s: 3.2.2, 3.4.0, 3.3.1, 3.1.5, 3.2.3
    • Component/s: yarn
    • Labels:
      None

      Description

      During an incident involving a DNS outage, a large number of nodemanagers failed to come back into service because they hit a socket timeout when trying to re-register with the RM.

      SocketTimeoutException is not currently one of the exceptions that the RMProxy will retry. Based on this incident, it seems like it should be. We made this change internally about a year ago and it has been running in production since.

        Attachments

        1. YARN-10479.001.patch
          1 kB
          Jim Brennan
        2. YARN-10479.002.patch
          6 kB
          Jim Brennan
        3. YARN-10479.003.patch
          6 kB
          Jim Brennan

          Activity

          $i18n.getText('security.level.explanation', $currentSelection) Viewable by All Users
          Cancel

            People

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment