Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Information Provided
-
2.16.0
-
None
-
None
Description
Same scenario of ARTEMIS-2421.
If network between Live Broker (B1) and NFS Server is disconnected (for example rejecting its TCP packets with iptables), after the lock lease timeout this happens:
- Backup server (B2) becomes Live
- When NFS connectivity of B1 is restored, B1 remains Live
So both broker are live.
Issue seems caused by java.nio.channels.FileLock#isValid used in org.apache.activemq.artemis.core.server.impl.FileLockNodeManager#isLiveLockLost, because it is always returning true, even if in the meanwhile the lock was lost and taken by B2.
Do you suggest to use specific mount options for NFS?
Or the lock evaluation should be replaced with a more reliable mechanism? We notice that FileLock#isValid is returning a cached value (true), even when NFS connectivity is down, so it would be better to use a validation mechanism that forces querying the NFS server.
Attachments
Issue Links
- is related to
-
ARTEMIS-2808 Artemis HA with shared storage strategy does not reconnect with shared storage if reconnection happens at shared storage
- Closed