Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-30101

Always use StandaloneClientHAServices to create RestClusterClient when retriving a Flink on YARN cluster client

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Won't Do
    • 1.16.0
    • 1.17.0
    • None

    Description

      Problem

      Currently, the procedure of retrieving a Flink on YARN cluster client is as follows (in YarnClusterDescriptor#retrieve method):

      1. Get application report from YARN
      2. Set rest.address & rest.port using the info from application report
      3. Create a new RestClusterClient using the updated configuration, will use client HA serivce to fetch the rest.address & rest.port if HA is enabled

      Here, we can see that the usage of client HA in step 3 is redundant, as we've already got the rest.address & rest.port from YARN application report. When ZK HA is enabled, this would take ~1.5 s to initialize client HA services and fetch the rest IP & port. 

      1.5 s can mean a lot for latency-sensitive client operations.  In my company, we use Flink client to submit short-running session jobs and e2e latency is critical. The job submission time is around 10 s on average, and 1.5s would mean a 15% time saving. 

      Proposal

      When retrieving a Flink on YARN cluster client, use StandaloneClientHAServices to
      create RestClusterClient instead as we have pre-fetched rest.address & rest.port from YARN application report. This is also what we did in KubernetesClusterDescriptor.

      Attachments

        Activity

          People

            Unassigned Unassigned
            Zhanghao Chen Zhanghao Chen
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: