I have been trying to think of any resource or resource request that does not really fit into this model. There are a few that I have come up with. I don't know if they should be supported or not. I am including them here mostly for discussion to see what others think.
- optional resources. For example I have a mapper that could really be sped up by having access to a GPU card, but it can run without it. So if there is one free I would like to use it, but if it isn't available I am OK without it. This opens up a bit of a can of worms because how do I indicate in the request how much effort would I like to go into giving me access to that resource before giving up.
- a subclass of this might be flexible resources. I would love to have 10GB of RAM, it would really make my process run fast, with all of the caching I would like to do, but I can get by with as little as 2GB.
- data locality/network accessible resources. I want to be near a given resource but I don't necessarily need to be on the same box as the resource. This could be used when reading from/writing to a filer, DB server, or even a different yet to be scheduled container. This is a bit tricky because we already try to do this, but in a different way. With each container we request a node/rack that we would like to run on. The reality is that at least for map reduce we are requesting an HDFS block that we would like to run near, but there are too many blocks to effectively put it into a resource model. Instead it was generalized, and would probably work for most things, except for the case of I want to be near a yet to be scheduled container. Even that would work, so long as we only launch containers one at a time.
- global resources. Things like bandwidth between datacenters for distcp copies. Perhaps these could be modeled more as cluster wide resources. But the APIs would have to be set up assuming that the free/max resources can change without any input from the RM. Meaning the scheduler may need to be able to deal with asking for those resources and having them now, not be available.
- default resources. For backwards compatibility with existing requests we may want to have some resources like CPU, that have a default value if not explicitly given.
- tiered or dependent resources. With network for example if I am doing a distcp across colos, I am not only using up network bandwidth for the given box that I am on, I am also using up network bandwidth on the rack switch, and the bandwidth going between the colos. I am not a network engineer so if I am completely wrong with the example hopefully the general concept still holds. Do we want to some how be able to tie different resources together so a request for one thing (intra-colo bandwidth) also implies a request for other resources?
I also have a few questions.
How are the plug-ins intended to adapt to different operating environments? Determining the amount of free CPU on Windows is very different from doing it on Linux. Would we have different plug-ins for different OS's? Would it be up to the end user to configure it all appropriately or would we try to auto-detect the environment and adjust automatically?
Do we want to handle enforcement of the resource limits as well? We currently kill off containers that are using more memory than requested (with some leeway). What about other things like CPU or network, do we kill the container if it goes over, do we try to use the OS to restrict the amount used, do we warn the user that they went over, or do we just ignore it and hope everyone is a good citizen?