Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-5597 YARN Federation improvements
  3. YARN-8673

[AMRMProxy] More robust responseId resync after an YarnRM master slave switch

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.10.0, 3.2.0
    • Component/s: amrmproxy
    • Labels:
      None

      Description

      After master slave switch of YarnRM, an ApplicationNotRegisteredException will be thrown from the new YarnRM. AM will re-regsiter and reset the responseId to zero. AMRMClientRelayer inside FederationInterceptor follows the same protocol, and does the automatic re-register and responseId resync. However, when exceptions or temporary network issue happens in the allocate call after re-register, the resync logic might be broken. This patch improves the robustness of the process by parsing the expected repsonseId from YarnRM exception message. So that whenever the responseId is out of sync for whatever reason, we can automatically resync and move on. 

        Attachments

        1. YARN-8673-branch-2.v2.patch
          21 kB
          Botong Huang
        2. YARN-8673.v2.patch
          21 kB
          Botong Huang
        3. YARN-8673.v1.patch
          17 kB
          Botong Huang

          Activity

            People

            • Assignee:
              botong Botong Huang
              Reporter:
              botong Botong Huang
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: