Uploaded image for project: 'ActiveMQ Artemis'
  1. ActiveMQ Artemis
  2. ARTEMIS-3030

Journal lock evaluation fails when NFS is temporarily disconnected

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Information Provided
    • 2.16.0
    • None
    • Broker
    • None

    Description

      Same scenario of ARTEMIS-2421.

      If network between Live Broker (B1) and NFS Server is disconnected (for example rejecting its TCP packets with iptables), after the lock lease timeout this happens:

      • Backup server (B2) becomes Live
      • When NFS connectivity of B1 is restored, B1 remains Live

      So both broker are live.

      Issue seems caused by java.nio.channels.FileLock#isValid used in org.apache.activemq.artemis.core.server.impl.FileLockNodeManager#isLiveLockLost, because it is always returning true, even if in the meanwhile the lock was lost and taken by B2.

      Do you suggest to use specific mount options for NFS?

      Or the lock evaluation should be replaced with a more reliable mechanism? We notice that FileLock#isValid is returning a cached value (true), even when NFS connectivity is down, so it would be better to use a validation mechanism that forces querying the NFS server.

      Attachments

        Issue Links

          Activity

            People

              nigrofranz Francesco Nigro
              apachedev Apache Dev
              Votes:
              3 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: