Details
-
Sub-task
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
When testing zero-copy in Ozone (stateMachineCache enabled), we saw hundreds of thousands of ServerProtocol messages trapped unclosed although the number of entries cached in Ozone StateMachine was small (<500). Also, the utilization of direct memory by Netty is high and doesn't go down after the test run is done.
Turns out, an appendEntries request can contain multiple log entries. Some of them can be metadata or configuration entries whose size is small (~10-20 bytes). Some of them can be StateMachine entries whose size is much bigger (4mb).
Today, when stateMachineCache is enabled, the StateMachine entities stored in LogCache don't have a reference count to the original appendEntries, but metadata and configuration entries do. Because the size of metadata and configuration is small, they will almost never fill up the LogCache to trigger a cacheEvict. Their references to the original appendEntries request prevent the request buffer from being released when StateMachine cache evicts the StateMachine entries.
When stateMachineCache enabled, the metadata and config entries should not hold a reference to the original appendEntries.
I did a quick test and compared the direct mem util and number of unreleased message before and after making the change.
Attachments
Attachments
Issue Links
- is related to
-
RATIS-2094 TransactionContext's stateMachineLogEntry and stateMachineContext may cause corruption
- Resolved
- links to