Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-13938

Use pre-uploaded libs to accelerate flink submission

    XMLWordPrintableJSON

Details

    Description

      Currently, every time we start a flink cluster, flink lib jars need to be uploaded to hdfs and then register Yarn local resource so that it could be downloaded to jobmanager and all taskmanager container. I think we could have two optimizations.

      1. Use pre-uploaded flink binary to avoid uploading of flink system jars
      2. By default, the LocalResourceVisibility is APPLICATION, so they will be downloaded only once and shared for all taskmanager containers of a same application in the same node. However, different applications will have to download all jars every time, including the flink-dist.jar. We could use the yarn public cache to eliminate the unnecessary jars downloading and make launching container faster.
         

      How the feature work?

      • Add yarn.provided.lib.dirs to configure pre-uploaded libs, which contain files that are useful for all the users of the platform(i.e. different applications).
      • When the Flink client wants to ship a local file, it will check the provided libs first. If the provided libs contains a file with the same name, the local ship files will be automatically excluded from uploading.
      • These provided libs needs to be public readable and will be set with PUBLIC visibility for local resources. So they will be cache in the nodes and shared by different applications.

       

      How to use the pre-upload feature?
      1. First, upload the Flink binary to the HDFS directories
      2. Use yarn.provided.lib.dirs to specify the pre-uploaded libs
       
      A final submission command could be issued like following.

      ./bin/flink run -m yarn-cluster -d \
      -yD yarn.provided.lib.dirs=hdfs://myhdfs/flink/lib,hdfs://myhdfs/flink/plugins \
      examples/streaming/WindowJoin.jar
      

      Attachments

        Issue Links

          Activity

            People

              wangyang0918 Yang Wang
              wangyang0918 Yang Wang
              Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h