Details
-
New Feature
-
Status: Closed
-
Major
-
Resolution: Implemented
-
1.11.0
Description
Currently, every time we start a flink cluster, flink lib jars need to be uploaded to hdfs and then register Yarn local resource so that it could be downloaded to jobmanager and all taskmanager container. I think we could have two optimizations.
- Use pre-uploaded flink binary to avoid uploading of flink system jars
- By default, the LocalResourceVisibility is APPLICATION, so they will be downloaded only once and shared for all taskmanager containers of a same application in the same node. However, different applications will have to download all jars every time, including the flink-dist.jar. We could use the yarn public cache to eliminate the unnecessary jars downloading and make launching container faster.
How the feature work?
- Add yarn.provided.lib.dirs to configure pre-uploaded libs, which contain files that are useful for all the users of the platform(i.e. different applications).
- When the Flink client wants to ship a local file, it will check the provided libs first. If the provided libs contains a file with the same name, the local ship files will be automatically excluded from uploading.
- These provided libs needs to be public readable and will be set with PUBLIC visibility for local resources. So they will be cache in the nodes and shared by different applications.
How to use the pre-upload feature?
1. First, upload the Flink binary to the HDFS directories
2. Use yarn.provided.lib.dirs to specify the pre-uploaded libs
A final submission command could be issued like following.
./bin/flink run -m yarn-cluster -d \
-yD yarn.provided.lib.dirs=hdfs://myhdfs/flink/lib,hdfs://myhdfs/flink/plugins \
examples/streaming/WindowJoin.jar
Attachments
Issue Links
- is related to
-
FLINK-17632 Support to specify a remote path for job jar
- Closed
-
FLINK-17472 StreamExecutionEnvironment and ExecutionEnvironment in Yarn mode
- Closed
-
FLINK-14964 Support configure remote flink jar
- Closed
- links to