Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-25318 Improvement of scheduler and execution for Flink OLAP
  3. FLINK-25338

Improvement of connection from TM to JM in session cluster

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.12.7, 1.13.5, 1.14.2
    • None
    • Runtime / Coordination
    • None

    Description

      When taskmanager receives slot request from resourcemanager for the specify job, it will connect to the jobmaster with given job address. Taskmanager register itself, monitor the heartbeat of job and update task's state by this connection. There's no need to create connections in one taskmanager for each job, and when the taskmanager is busy, it will increase the latency of job.

      One idea is that taskmanager manages the connection to `Dispatcher`, sends events such as heartbeat, state update to `Dispatcher`, and `Dispatcher` tell the local `JobMaster`. The main problem is that `Dispatcher` is an actor and can only be executed in one thread, it may be the performance bottleneck for deserialize event.

      The other idea is to create a netty service in `SessionClusterEntrypoint`, it can receive and deserialize events from taskmanagers in a threadpool, and send the event to the `Dispatcher` or `JobMaster`. Taskmanagers manager the connection to the netty service when it start. Thus a service can also receive the result of a job from taskmanager later.

      xtsong What do you think? THX

      Attachments

        Activity

          People

            Unassigned Unassigned
            zjureel Fang Yong
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated: