Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
Description
1. SlotPool will request slot to rm if its slots are not enough.
2. If a slot request is not fulfilled in a certain time, SlotPool will treat the request as timeout and send a new slot request by triggering a failover in JobMaster, the previous request is not needed any more, but rm does not know it.
3. This may cause the rm request much more resource than the job really need.
For example:
1. A job need 100 slots. RM request 100 container to YARN.
2. But YARN is busy now, it has no resource for the job.
3. The job failover as the resource request not fulfilled in time.
4. It ask 100 slots again, now RM request 200 container to YARN.
5. If failover server time, the containers request will become more and more.
6. Now YARN has resource, it will find that the job may need thousands of containers. This is a waste of resources.
Attachments
Issue Links
- blocks
-
FLINK-4344 Implement new JobManager
- Resolved
- links to