This prototype is a way to understand the overall design and the major issues that need to be addressed and minor details that crop up.
This is not a substitute to actual code/unit test for each sub task.
Hopefully this will help a discussion on the approach for overall approach and each sub task.
In this prototype, the following changes are demonstrated.
1. Containers that were running when RM restarted, will continue running
2. NM on resync sends the list of running containers as ContainerReport so they provide container capability (sizes).
3. AM on resync reregisters instead of shutting down. AM can make further requests after RM restart and they are accepted.
4. Sample of scheduler changes in FairScheduler. It reregisters the application attempt on recovery. On NM addNode it adds the containers to that applicationAttempt and charges these correctly to the application attempt for tracking usage.
5. Application and Containers resume their lifecycle with additional transitions to support continuation after recovery.
6. clustertimestamp is added to containerId so that containerId after RM restart do not clash with containerId before (as the containerId counter resets to zero in memory)
7. Changes are controlled by flag.
Not addressed topics
1. Key and token changes
2. AM does not resend requests sent before restart yet. So if the RM restarts after AM has made its request and before RM returns a container, AM is left waiting for allocation. Only new asks made after RM restart work.
3. Completed container status as per design is not handled yet.
Readme for running through the prototype
a) Setup with RM recovery turned on and scheduler set to FairScheduler
b) Start sleep job with map and reduce such as
bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.0.0-SNAPSHOT.jar sleep -mt 12000 -rt 12000
c) Restart RM (yarn-daemon.sh stop/start resourcemanager) and see that containers are not restarted.
Following 2 scenarios work
1. restart rm while reduce is running. reduce continues and then application completes successfully. Demonstrates continuation of running containers without restart.
2. restart rm while map is running. map continues and then reduce executes and then application completes successfully. Demonstrates requesting more resources after restart works in addition to the previous scenario.