Details
-
Improvement
-
Status: Resolved
-
Normal
-
Resolution: Fixed
-
None
-
Operability
-
Normal
-
All
-
None
-
Description
The metrics we have around the CommitLog aren’t as useful as they could be in the context of investigating the performance of local writes.
1.) We have no way to know how long the actual flush to disk takes in isolation, i.e. separate from the signaling apparatus between mutation threads and the sync thread. We should add a metric for this.
2.) The WaitingOnCommit metric can have multiple data points recorded for a single mutation, which is a little awkward when we’re trying to break down the latency of a local write (total time for CL add + Memtable put, etc.). More specifically, a thread waits for the sync thread to catch up to the position of its mutation, but it can wake up for a sync operation that hasn’t arrived there yet, which triggers another wait. A new data point is recorded for the metric each time this happens. We should move the scope of metric recording up a level so that there is a 1-1 relationship between it and WriteLatency in TableMetrics (which covers row cache updates and the Memtable put).
void waitForSync(int position, Timer waitingOnCommit) { while (lastSyncedOffset < position) { WaitQueue.Signal signal = waitingOnCommit != null ? syncComplete.register(waitingOnCommit.time()) : syncComplete.register(); if (lastSyncedOffset < position) signal.awaitUninterruptibly(); else signal.cancel(); } }