ActiveMQ
  1. ActiveMQ
  2. AMQ-2080

InitialReconnectDelay appears to be ignored in Discovery transport URLs

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Not A Problem
    • Affects Version/s: 5.2.0
    • Fix Version/s: 5.2.0
    • Component/s: Transport
    • Labels:
      None
    • Environment:

      Windows XP SP3

      Description

      Using a connection URL of

      discovery:(multicast://default?group=test)?maxReconnectAttempts=13&initialReconnectDelay=1000&useExponentialBackOff=false

      one would expect initial connection attempts to go on for at least 13 seconds (13 reconnect attempts with 1000ms delay between attempts) but in fact the error "No uris available to connect to" returned in less than a second. Changing useExponentialBackOff to true delays a failure report to about 41 seconds, which is 10ms * 2^12, which is what you'd expect with 12 reconnect attempts (13 connect attempts) starting with the default 10ms delay and doubling with every attempt, since 2^0+2^1+2^2+...2^n-1 is approx 2^n. (I guess maxReconnectAttempts should be called maxConnectAttempts, but I'm not opening a bug about that.) Changing maxReconnectAttempts to 12 causes the delay to be about 20 seconds, half of what it is for 13, so that checks out.

      Altogether this points to the initialReconnectDelay parameter being ignored on initial connection attempts. It is supposed to work per http://activemq.apache.org/discovery-transport-reference.html

        Activity

        Jeremy Grodberg created issue -
        Jeremy Grodberg made changes -
        Field Original Value New Value
        Description Using a connection URL of

        {{discovery:(multicast://default?group=test)?maxReconnectAttempts=13&initialReconnectDelay=1000&useExponentialBackOff=false}}

        one would expect initial connection attempts to go on for at least 13 seconds (13 reconnect attempts with 1000ms delay between attempts) but in fact the error "No uris available to connect to" returned in less than a second. Changing {{useExponentialBackOff}} to {{true}} delays a failure report to about 41 seconds, which is 10ms * 2^12, which is what you'd expect with 11 reconnect attempts (12 connect attempts) starting with the default 10ms delay and doubling with every attempt. (I'm not opening a bug on why it's 2^12 instead of 2^14 as I would expect it to be.) Changing maxReconnectAttempts to 12 causes the delay to be about 20 seconds, half of what it is for 13, so that checks out.

        Altogether this points to the initialReconnectDelay parameter being ignored on initial connection attempts. It is supposed to work per http://activemq.apache.org/discovery-transport-reference.html
        Using a connection URL of

        {{discovery:(multicast://default?group=test)?maxReconnectAttempts=13&initialReconnectDelay=1000&useExponentialBackOff=false}}

        one would expect initial connection attempts to go on for at least 13 seconds (13 reconnect attempts with 1000ms delay between attempts) but in fact the error "No uris available to connect to" returned in less than a second. Changing {{useExponentialBackOff}} to {{true}} delays a failure report to about 41 seconds, which is 10ms * 2^12, which is what you'd expect with 12 reconnect attempts (13 connect attempts) starting with the default 10ms delay and doubling with every attempt, since 2^0+2^1+2^2+...2^n-1 is approx 2^n. (I guess maxReconnectAttempts should be called maxConnectAttempts, but I'm not opening a bug about that.) Changing maxReconnectAttempts to 12 causes the delay to be about 20 seconds, half of what it is for 13, so that checks out.

        Altogether this points to the initialReconnectDelay parameter being ignored on initial connection attempts. It is supposed to work per http://activemq.apache.org/discovery-transport-reference.html
        Gary Tully made changes -
        Assignee Gary Tully [ gtully ]
        Hide
        Gary Tully added a comment -

        the initialReconnectDelay kicks in when a candidate url is returned by the discovery agent. Up to that point, the reconnectDelay is in effect and this defaults to 10 ms. I think there is some value in leaving things are they are such that it is possible to have a different initial reconnect delay for discovery finding some candidate urls and for connecting to these urls.

        I added a test that verifies the setting of reconnectDelay takes effect when no broker can be found. It uses a uri of the form: "discovery:

        (multicast://default)?useExponentialBackOff=false&maxReconnectAttempts=2&reconnectDelay=4000

        Does this work for you. If so I can update the documentation with a reference to reconnectDelay

        Show
        Gary Tully added a comment - the initialReconnectDelay kicks in when a candidate url is returned by the discovery agent. Up to that point, the reconnectDelay is in effect and this defaults to 10 ms. I think there is some value in leaving things are they are such that it is possible to have a different initial reconnect delay for discovery finding some candidate urls and for connecting to these urls. I added a test that verifies the setting of reconnectDelay takes effect when no broker can be found. It uses a uri of the form: "discovery: (multicast: // default )?useExponentialBackOff= false &maxReconnectAttempts=2&reconnectDelay=4000 Does this work for you. If so I can update the documentation with a reference to reconnectDelay
        Hide
        Jeremy Grodberg added a comment -

        I support the idea of different behavior before the discoveryAgent discovers a broker as opposed to after, since the delay to discover a broker is a completely different set of issues compared to the delay to connect to a broker that is advertising that it is alive. Because of this, I think the different behaviors should be completely separately configurable with regard to delay between attempts, number of attempts, and backoff strategy.

        In any case, though there may be good reasons it evolved this way, I find it counter-intuitive and confusing that it would be the "reconnectDelay" and not the "initialReconnectDelay" that is the reconnect delay used in the initial attempt to find a broker.

        For my immediate needs I can just deal with the current behavior, in part because there is some other bug (perhaps in the JVM or the Win2K TCP stack) that is causing any attempts to discover the broker to fail after the discovery agent has been running for a few minutes. I suspect this is a Windows bug because it affects other processes running in separate JVMs, but I'm not a Windows expert so I don't know how reasonable it is to believe there's this kind of bug still existing in the OS.

        Show
        Jeremy Grodberg added a comment - I support the idea of different behavior before the discoveryAgent discovers a broker as opposed to after, since the delay to discover a broker is a completely different set of issues compared to the delay to connect to a broker that is advertising that it is alive. Because of this, I think the different behaviors should be completely separately configurable with regard to delay between attempts, number of attempts, and backoff strategy. In any case, though there may be good reasons it evolved this way, I find it counter-intuitive and confusing that it would be the "reconnectDelay" and not the "initialReconnectDelay" that is the reconnect delay used in the initial attempt to find a broker. For my immediate needs I can just deal with the current behavior, in part because there is some other bug (perhaps in the JVM or the Win2K TCP stack) that is causing any attempts to discover the broker to fail after the discovery agent has been running for a few minutes. I suspect this is a Windows bug because it affects other processes running in separate JVMs, but I'm not a Windows expert so I don't know how reasonable it is to believe there's this kind of bug still existing in the OS.
        Hide
        Gary Tully added a comment -

        agree, it is not intuitive, but the original value of the reconnectDelay is in use prior to the discovery component returning a list of candidate urls to connect to.
        Added a reference to the documentation: http://activemq.apache.org/discovery-transport-reference.html

        Re your current problem, I think it is unlikely to be a JVM of OS issue.
        Have you enabled debug logging for the class: org.apache.activemq.transport.discovery.multicast.MulticastDiscoveryAgent to see if there is any indicator there. Some of the configuration options on the MulticastDiscoveryAgent may not be appropriate to your scenario.
        The discoveryAgent properties are set like the group attribute:

        discovery:(multicast://default?group=test&keepAliveInterval=1000)...
        Show
        Gary Tully added a comment - agree, it is not intuitive, but the original value of the reconnectDelay is in use prior to the discovery component returning a list of candidate urls to connect to. Added a reference to the documentation: http://activemq.apache.org/discovery-transport-reference.html Re your current problem, I think it is unlikely to be a JVM of OS issue. Have you enabled debug logging for the class: org.apache.activemq.transport.discovery.multicast.MulticastDiscoveryAgent to see if there is any indicator there. Some of the configuration options on the MulticastDiscoveryAgent may not be appropriate to your scenario. The discoveryAgent properties are set like the group attribute: discovery:(multicast: // default ?group=test&keepAliveInterval=1000)...
        Gary Tully made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Fix Version/s 5.2.0 [ 11841 ]
        Resolution Not A Problem [ 6 ]
        Hide
        Jeremy Grodberg added a comment -

        Thank you for updating the documentation.

        Is there a way to use exponential backoff for reconnects to a discovered URL but constant delays while waiting to discover the broker? If not, can we make that a feature request?

        Re my current problem, yes, I had turned on trace-level debugging for MulticastDiscoveryAgent and it didn't tell me anything other than I had properly set the group. Once the parameter parsing is completed I don't get any more logging until the exception is thrown that the connection failed because no URIs were discovered. It doesn't appear to matter whether I set loopback to true or not (not surprising since nothing but the broker is running on the broker machine).

        Show
        Jeremy Grodberg added a comment - Thank you for updating the documentation. Is there a way to use exponential backoff for reconnects to a discovered URL but constant delays while waiting to discover the broker? If not, can we make that a feature request? Re my current problem, yes, I had turned on trace-level debugging for MulticastDiscoveryAgent and it didn't tell me anything other than I had properly set the group. Once the parameter parsing is completed I don't get any more logging until the exception is thrown that the connection failed because no URIs were discovered. It doesn't appear to matter whether I set loopback to true or not (not surprising since nothing but the broker is running on the broker machine).
        Hide
        Jeremy Grodberg added a comment -

        Also, is this consistent with the behavior of the failover transport? If so, then the failover transport documentation should be updated, too. If not.....

        Show
        Jeremy Grodberg added a comment - Also, is this consistent with the behavior of the failover transport? If so, then the failover transport documentation should be updated, too. If not.....
        Jeff Turner made changes -
        Project Import Fri Nov 26 22:32:02 EST 2010 [ 1290828722158 ]
        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Resolved Resolved
        7d 10h 48m 1 Gary Tully 27/Jan/09 13:08

          People

          • Assignee:
            Gary Tully
            Reporter:
            Jeremy Grodberg
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development