Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
(1) The current monitor is heavy-weight.
- Backpressure monitoring works by repeatedly taking stack trace samples of your running tasks.
(2) It is difficult to find out which vertex is the source of backpressure.
- User need to know current and upstream's network metric to judge current whether is the source of backpressure. Now user has to record relevant information.
Proposed Changes
1. expose the new mechanism implemented in FLINK-14472 as a "is back-pressured" metric.
2. show the vertex that produces the backpressure source for the job.
3. expose network metric in IOMetricsInfo:
- SubTask
- pool usage: outPoolUsage, inputExclusiveBuffersUsage, inputFloatingBuffersUsage.
- If the subtask is not back pressured, but it is causing backpressure (full input, empty output)
- By comparing exclusive/floating buffers usage, whether all channels are back-pressure or only some of them
- back-pressured for show whether it is back pressured.
- pool usage: outPoolUsage, inputExclusiveBuffersUsage, inputFloatingBuffersUsage.
- Vertex
- pool usage: outPoolUsageAvg, inputExclusiveBuffersUsageAvg, inputFloatingBuffersUsageAvg
- back-pressured for show whether it is back pressured(merge all iths subtasks)
Attachments
Attachments
Issue Links
- is related to
-
FLINK-14472 Implement back-pressure monitor with non-blocking outputs
- Closed
- relates to
-
FLINK-22253 Update backpressure monitoring documentation
- Closed
-
FLINK-20724 Create a http handler for aggregating metrics from whole job
- Closed
1.
|
Expose the new mechanism implemented in FLINK-14472 as a "is back-pressured" metric | Closed | lining |
|
||||||||
2.
|
Expose network metric for job vertex in rest api | Closed | Unassigned | |||||||||
3.
|
Expose network metric for sub task in rest api | Closed | Unassigned |
|
||||||||
4.
|
Show the vertex that produces the backpressure source in the job | Closed | Piotr Nowojski | |||||||||
5.
|
Better BackPressure Detection in WebUI | Closed | Unassigned | |||||||||
6.
|
Create backPressuredTimeMsPerSecond metric | Closed | Piotr Nowojski | |||||||||
7.
|
Create busyTimeMsPerSecond metrics | Closed | Piotr Nowojski | |||||||||
8.
|
Enrich back pressure stats per subtask in the WebUI | Closed | Piotr Nowojski | |||||||||
9.
|
Pause the idle/back pressure timers during processing mailbox actions | Closed | Piotr Nowojski |