Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
None
-
None
-
None
Description
After the change of container timer metrics (chooseNs, windowNs, processNs, and commitNs) from millisecond to nanosecond, we noticed a dramatic increase of memory heap usage in one of our production job. After investigation we found that the SlidingTimeWindowReservoir.update(duration) will be called much more frequently due to the duration is non-zero after the nanosecond change (In contrast, it is often zero when using millisecond). Within the 5-minute window, the storage inside SlidingTimeWindowReservoir increases a lot for a high qps job (for our job with around 10K qps, it increases the heap from <5M to 100M). It causes long GCs and degrades the job performance.