XML

Word

Printable

JSON

Details

Type: Improvement
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: Runtime / Metrics
Labels:
None

Description

A series of flink metrics are using the System.currentTimeMillis[1] to measure the elapsed time. I propose to refactor them from System.currentTimeMillis to System.nanoTime[2].

Why do we need to refactor it?

Note: High precision is not the reason for refactor.

Actually, System.currentTimeMillis() and System.nanoTime() have completely different semantics.

System.currentTimeMillis() != System.nanoTime() / 1_000_000

System.currentTimeMillis() is current system time of the server.
- The time can be updated by NTP[3], or it can be adjusted manually.
- Therefore, when we use System.currentTimeMillis, the end time may be less than the start time
System.nanoTime() usually indicates the length of time since the operating system was booted.
- So System.nanoTime isn't system time, and it's not effected by system time.
- System.nanoTime (inside the process) is monotonically increasing and never goes back.
- As the job doc[2] mentioned: this method can only be used to measure elapsed time and is not related to any other notion of system or wall-clock time.

Here is a blog[4] to explain their difference in detail.

Current use cases:

Based on last part, we know the System.nanoTime is recommended for measuring the duration.

Most of tracing systems are using it, and flink also uses it to measure the duration for some metrics, such as:

all latency tracks of state backend
SubtaskCheckpointCoordinatorImpl#takeSnapshotSync measures the checkpoint Sync Duration
etc

In addition, the Clock[5] of flink extracted the absoluteTimeMillis, relativeTimeMillis and relativeTimeNanos before. But I guess most of developers doesn't know these details.

absoluteTimeMillis is using System.currentTimeMillis
relativeTimeMillis and relativeTimeNanos are using System.nanoTime
It's better to call relativeTimeNanos or absoluteTimeMillis instead of absoluteTimeMillis for all duration related metrics