Thanks Karthik Kambatla, Wangda Tan, Vinod Kumar Vavilapalli and Steve Loughran for taking a look at our proposal. PFA updated doc (v2) that incorporates your feedback.
A few additional clarifications (have addressed rest of comments directly in the updated doc):
While making these changes, would it possible to address YARN-314 too?
I'm okay if we can get two in a shot, but I'd caution against risking this effort by blowing up the size.
We will address YARN-314 as long as applications specify the Request-ID as they can request for multiple containers at same priority through independent requests.
Why are we putting priority semantics onto the ID? We should just follow the existing priority ordering.
We will continue to follow the existing priority ordering. But as explained above, with the proposed enhancement user can potentially make multiple requests at same priority (YARN-314). In such a scenario, we will simply allocate containers in FIFO order.
BTW, for the federation related issue, does the client-library need to always generate these IDs? How does that interact with application generated IDs?
In Federation also, we expect applications to generate the IDs. For e.g.: we will work with the REEF team (and the long running service AM proposed as part of YARN-4692) to start specifying IDs for their allocation requests.
I would also like to see if the allocated containers could support a role ID field too...nothing much, but enough that on an AM restart their role can be determined. That one, I'd keep separate from the request ID; they serve slightly different purposes. (I could have 5 requests outstanding for containers of role 4; I'd want to track those requests)
I agree that having an explicit role ID is useful but feel its outside the scope of this JIRA which IIUC is what you are also observing. I think adding a role ID should be part of YARN-4692/