Details
-
Sub-task
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
3.4.0
-
Reviewed
Description
From the actual situation, the probability of this happening is very low.
It can only be caused by the master-slave fail-hover of YARN and the wrong Epoch parameter configuration.
We will try to be compatible with this situation and let the Application run as much as possible, using the following measures:
1. Select a node whose heartbeat does not time out for allocation, and at the same time require the node to be in the RUNNING state.
2. If the heartbeat of both RMs does not time out, and both are in the RUNNING state, select the previously allocated RM for Container processing.
Attachments
Issue Links
- links to