ActiveMQ
  1. ActiveMQ
  2. AMQ-3542

Using failover: with static discovery in a network connector to choose from a master/slave tuple leads to hangs and invalid states

    Details

      Description

      static discovery will try to connect to all provided urls. When the list is a master/slave pair with shared storage, only one will active, leading log messages indicating repeated failure to connect.
      A potential solution is to use failover: just to pick a url but let it delegate failover to the network connector such that the network bridge is correctly stopped/restarted.
      static:(failover:(tcp://a:61616,tcp://slave:61616)?maxReconnectAttempts=..)
      This does not work reliably atm, due to inconsistency in the failover reconnect logic, a network connectors interest in transport interruption/resumption and the lack of thread safety in tracking existing bridges.

        Issue Links

          Activity

          Hide
          Gary Tully added a comment -

          fix and test in: http://svn.apache.org/viewvc?rev=1183062&view=rev

          static:(failover:(a,b)?maxReconnectAttempts=0) now works reliably to choose a working url from (a,b) and report failures without trying to reconnect/failover. The network connector then gets to recover using its stop/restart logic.
          This is a usage pattern for failover: where it is used solely to pick a valid broker url. It does not try to recover a failed connection. Network connectors need to manage failed connections themselves b/c they are not simple jms client connections.

          Note: this commit changes the default value of maxReconnectAttempts for the failover: transport. A value of 0 now disables reconnections, the default for infinite retries is now -1.

          Show
          Gary Tully added a comment - fix and test in: http://svn.apache.org/viewvc?rev=1183062&view=rev static:(failover:(a,b)?maxReconnectAttempts=0) now works reliably to choose a working url from (a,b) and report failures without trying to reconnect/failover. The network connector then gets to recover using its stop/restart logic. This is a usage pattern for failover: where it is used solely to pick a valid broker url. It does not try to recover a failed connection. Network connectors need to manage failed connections themselves b/c they are not simple jms client connections. Note: this commit changes the default value of maxReconnectAttempts for the failover: transport. A value of 0 now disables reconnections, the default for infinite retries is now -1.
          Hide
          Martin Serrano added a comment -

          note: Make sure you do not have randomize=false on your failover uri as I did. I copied it from a place where that makes sense. In this situation, it means the a part of the failover would always be chosen when recovery occurs and it will never fail over. That was a wasted day.

          Show
          Martin Serrano added a comment - note: Make sure you do not have randomize=false on your failover uri as I did. I copied it from a place where that makes sense. In this situation, it means the a part of the failover would always be chosen when recovery occurs and it will never fail over. That was a wasted day.

            People

            • Assignee:
              Gary Tully
              Reporter:
              Gary Tully
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development