Normally the UDF just creates short-life small objects and these can be recycled quickly by JVM, so most of the memory resource is controlled and managed by TaskManager framework. But for some special cases, the UDF may consume much resource to create long-live big objects, so it is necessary to provide the options for professional users to define the resource usages if needed.
The basic approach is the following:
- Introduce the ResourceSpec structure to describe the different resource factors (cpu cores, heap memory, direct memory, native memory, etc) and provide some basic construction methods for resource group.
- The ResourceSpec can be setted onto the internal transformation in DataStream and base operator in DataSet separately.
- The StateBackend should provide the method to estimate the memory usage based on hint of state size in ResourceSpec.
- In job graph generation, the ResourceSpec will be aggregated for chained operators.
- When JobManager requests slot for submitting task from ResourceManager, the ResourceProfile will be expanded to correspond with ResourceSpec.
- The ResourceManager requests resource for container from cluster, it should consider extra framework memory except for slot ResourceProfile.
- The framework memory is mainly used by NetworkBufferPool and MemoryManager in TaskManager, and it can be configured in job level.
- Apart from resource, The JVM options attached with container should be supported and could also be configured in job level.
This feature will be implemented directly into flip-6 branch.