Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-5597 YARN Federation improvements
  3. YARN-6667

Handle containerId duplicate without failing the heartbeat in Federation Interceptor

    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      From the actual situation, the probability of this happening is very low.
      It can only be caused by the master-slave fail-hover of YARN and the wrong Epoch parameter configuration.

      We will try to be compatible with this situation and let the Application run as much as possible, using the following measures:
      1. Select a node whose heartbeat does not time out for allocation, and at the same time require the node to be in the RUNNING state.
      2. If the heartbeat of both RMs does not time out, and both are in the RUNNING state, select the previously allocated RM for Container processing.

      Attachments

        Issue Links

          Activity

            People

              slfan1989 Shilun Fan
              botong Botong Huang
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: