[FLINK-10981] Add or modify metrics to show the maximum usage of InputBufferPool/OutputBufferPool to help debugging back pressure - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Won't Do
Affects Version/s: None
Fix Version/s: None
Component/s: Runtime / Metrics, Runtime / Network
Labels:
None

Description

Currently the network layer has provided two metrics items, namely InputBufferPoolUsageGauge and OutputBufferPoolUsageGauge to show the usage of input buffer pool and output buffer pool. When there are multiple inputs(SingleInputGate) or outputs(ResultPartition), the two metrics items show their average usage.

However, we found that the maximum usage of all the InputBufferPool or OutputBufferPool is also useful in debugging back pressure. Suppose we have a job with the following job graph:

          F     
           \
            \
            _\/      
A ---> B ----> C ---> D
       \
        \
         \-> E

Besides, also suppose D is very slow and thus cause back pressure, but E is very fast and F outputs few records, thus the usage of the corresponding input/output buffer pool is almost 0.

Then the average input/output buffer usage of each task will be:

A(100%) --> (100%) B (50%) --> (50%) C (100%) --> (100%) D

But the maximum input/output buffer usage of each task will be:

A(100%) --> (100%) B (100%) --> (100%) C (100%) --> (100%) D

Users will be able to find the slowest task by finding the first task whose input buffer usage is 100% but output usage is less than 100%.

If it is reasonable to show the maximum input/output buffer usage, I think there may be three options:

Modify the current computation logic of InputBufferPoolUsageGauge and OutputBufferPoolUsageGauge.
Add two new metrics items InputBufferPoolMaxUsageGauge and OutputBufferPoolMaxUsageGauge.
Try to show distinct usage for each input/output buffer pool.

and I think maybe the second option is the most preferred.

How do you think about that?

Attachments

Activity

People

Assignee:: Yun Gao

Reporter:: Yun Gao

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 22/Nov/18 03:51

Updated:: 03/Aug/20 09:34

Resolved:: 31/Jul/20 11:57