Updating patch based on Jian He's suggesting and rebasing with latest
Do we need to do something for this condition ? else, it can be removed,
Yeah.. can be removed.. I had put that there to remind me of something.. forgot to remove it
In RollbackContainerTransition: the container.getResourceSet() will return all resources including current and previous version. We should re-request only the previous version's resources, rather than the union of both?
In the latest patch, the resourceSet is reverted to previous state as well.
I still have question on the commit API, how does AM use this API in practice ?
Commit is just a way for the AM to tell the NM that it is fine with the upgrade (after it performs some upgrade diagnostics check on the container perhaps) and the container is working as it should be.. After the AM does a commit, the container cannot be rolledback and any bookkeeping required to rollback (the reInitContext for eg.) can is deleted by the NM.
Prior to a commit, if the upgraded Container fails, NM can choose to automatically rollback. After the AM issues a commit, NM will not be able to rollback.
Of course the AM is still free to call 'upgrade' again, with an old launch context.
By default, autoCommit is 'true' which means, as soon as the container is upgraded, it is also committed.
..one implication for this API is that we'll have to persiste the commit state for NM recovery later on.
Yes.. we would.. I plan to open a JIRA to address NMStateStore changes for this as well as
Also, should the rollback API be always be able to rollback ?
Once Commit has been called, you cannot rollback. The AM would have to explicitly call the upgrade API again with the previous launchContext.
ContainerLaunchContext already has the ContainerRetryContext ? can we reuse that retryContext?
I wanted to distinguish between the retry policy used to retry a failed container and the policy used to decide failure retries during upgrades. It is possible both can be the same. I just put that argument there in the upgrade() API to make it explicit.
The ContainerImpl#ContainerRetryContext is not updated to new value on upgrade.
This is fixed in the latest
RetryFailureTranstion: it's a bit complicated.. is it possible to simplify it something like below:
I refactored it a bit.. let me know if its ok.