I noticed the values of total metrics for streams were decreasing periodically when viewed in JMX, for example process-total for each processor-node-id under stream-processor-node-metrics.
Edit: For processor node metrics, I should have been looking at ProcessorNode, not StreamsMetricsThreadImpl.
Looking at StreamsMetricsThreadImpl, I believe this behavior is due to using Count() as the Stat for the *-total metrics. Count() is a SampledStat, so the value it reports is the count in recent time windows, and the value decreases whenever a window is purged. This explains the behavior I saw, but I think the issue is deeper. For example, processTimeSensor attempts to measure, process-latency-avg, process-latency-max, process-rate, and process-total. For that sensor, record is called like streamsMetrics.processTimeSensor.record(computeLatency() / (double) processed, timerStartedMs);
so the value passed to record is average latency per processed message in this batch if I understand correctly. That gets pushed through to the call to Count#record, which increments it's count by 1, ignoring the value parameter. Whatever stat is recording the total would need to know is the number of messages processed. Because of that, I don't think it's possible for one Sensor to measure both latency and total. That said, it's not clear to me how all the different Stats work and how exactly Sensors work, and I don't actually understand how the process-rate metric is working for similar reasons but that seems to be correct, so I may be missing something here.