Details
-
Sub-task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.14.5, 1.15.3
-
None
-
None
Description
Currently client submit olap query to flink session cluster via http rest api, and pull the results through interval polling. The sink task in TaskManager creates socket server for each query, when the JobManager receives the pull request from client, it requests query results from the socket server. The process is as follows
Job submission path:
client -> http rest -> JobManager -> Sink Socket Server
Result acquisition path:
client <- http rest <- JobManager <- Sink Socket Server
This leads to two problems
1. There will be some performance loss when submitting jobs through http rest, for example, temporary files will be created for each job
2. The client pulls the result data at a certain time interval, which is a fixed cost. The larger interval leads to increase query latency, the smaller interval will increase the pressure of Dispatcher.
3. Each sink task initializes a socket server, it will increase the query latency, on the other hand, it wastes resources.
For the Flink OLAP scenario, we propose to add websocket protocol in session cluster to support submitting jobs and returning results. The client creates and manage a connection with websocket server, submits olap query to session cluster. The TaskManagers create and manage connection to websocket server too, and sink task sends results to the server in stream. When the JobManager receives the results from sink task, it pushes the result data to the client through the connection between them.
We implemented this feature in the internal Flink version of ByteDance. On average, the latency of each query can be reduced by about 100ms, it's a big optimization for OLAP queries.