Description
During the life of a shard process if there is a lot of loading and unloading from the block cache due to tables coming and going as well as a lot of data updates fragmentation can occur in the process. The issue seems to be when the UnsafeCacheValue allocates many small chunks of memory and later releases them when the cache reuse queue over flows. The end result is that the shard process grows much larger than the off heap cache and heap combined. If the shard process is aggressively configured for the server it's on it can use too much memory and Linux will end up killing the process once the server is in jeopardy. This normally ends with a cascading failure of an entire shard cluster.