ActiveMQ
  1. ActiveMQ
  2. AMQ-4674

ActiveMQ 5.x does not support the notion of a grace-period for heart beats as supported by the STOMP protocol

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 5.8.0
    • Fix Version/s: 5.9.0
    • Component/s: None
    • Labels:

      Description

      Regarding the configuration of heart beating the STOMP protocol spec states:

      "- because of timing inaccuracies, the receiver SHOULD be tolerant and take into account an error margin"

      However, it appears that ActiveMQ 5.x is not tolerant of any error margin.

      Despite the fact that the spec says SHOULD rather than MUST it would make the implementation of STOMP clients easier if the error margin was published.

      As the broker aggressively enforces the heart beat timeouts false failover attempts can result.

      Apparently Apollo supports an error margin of 1.5x the configured heart beat. If it could be made configurable that would be even better!

        Issue Links

          Activity

          Paul Gale created issue -
          Paul Gale made changes -
          Field Original Value New Value
          Issue Type New Feature [ 2 ] Bug [ 1 ]
          Paul Gale made changes -
          Fix Version/s 5.9.0 [ 12323932 ]
          Fix Version/s NEEDS_REVIEWED [ 12315630 ]
          Paul Gale made changes -
          Description Regarding the configuration of heart beating the STOMP protocol spec states:

              "- because of timing inaccuracies, the receiver SHOULD be tolerant and take into account an error margin"

          However, it appears that ActiveMQ 5.x is not tolerant of any error margin.

          Despite the fact that the spec says SHOULD rather than MUST it would make the implementation of STOMP clients easier if the error margin was published.

          Apparently Apollo supports an error margin of 1.5x the configured heart beat. If it could be made configurable that would be even better!
          Regarding the configuration of heart beating the STOMP protocol spec states:

              "- because of timing inaccuracies, the receiver SHOULD be tolerant and take into account an error margin"

          However, it appears that ActiveMQ 5.x is not tolerant of any error margin.

          Despite the fact that the spec says SHOULD rather than MUST it would make the implementation of STOMP clients easier if the error margin was published.

          As the broker aggressively enforces the heart beat timeouts false failover attempts can result.

          Apparently Apollo supports an error margin of 1.5x the configured heart beat. If it could be made configurable that would be even better!
          Hide
          Timothy Bish added a comment -

          Added transport options "?transport.hbGracePeriodMultiplier" which is used to add a user defined grace period to the read check interval indicated by the connecting STOMP client. By default this value is set at 1.0 in order to keep backwards compat with clients that don't expect a grace period. Can be set to values greater than 1.0f and the read check interval will be multiplied by this value.

          Show
          Timothy Bish added a comment - Added transport options "?transport.hbGracePeriodMultiplier" which is used to add a user defined grace period to the read check interval indicated by the connecting STOMP client. By default this value is set at 1.0 in order to keep backwards compat with clients that don't expect a grace period. Can be set to values greater than 1.0f and the read check interval will be multiplied by this value.
          Timothy Bish made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Assignee Timothy Bish [ tabish121 ]
          Resolution Fixed [ 1 ]
          Hide
          Paul Gale added a comment -

          I think something has gone awry here. This implementation of a 'grace period' doesn't really help.

          What I am observing, after configuring the grace period with a value of 1.5, is that when a client connects (simplified):

          CONNECT
          heart-beat:5000,0

          the broker responds with:

          CONNECTED
          heart-beat:0,7500

          which is incorrect. The grace period multiplier should NOT affect the broker's response. The broker should have responded with 5000.

          The client therefore honors the broker's inflated interval (per the STOMP spec) and sends a read check every 7500ms.

          The broker, meanwhile, applies the grace period multiplier, a second time, to its inflated read check interval. In this case, it now performs a read check every 11250ms (as shown in the DEBUG logging of the AbstractInactivityMonitor).

          With the client idle the broker's activemq.log contains TRACE log entries that read "A receive is in progress" every 11250ms. I presume this is the ReadCheckTimer handling the read check sent from the client? If so, the log output could be a little more descriptive.

          Show
          Paul Gale added a comment - I think something has gone awry here. This implementation of a 'grace period' doesn't really help. What I am observing, after configuring the grace period with a value of 1.5, is that when a client connects (simplified): CONNECT heart-beat:5000,0 the broker responds with: CONNECTED heart-beat:0,7500 which is incorrect. The grace period multiplier should NOT affect the broker's response. The broker should have responded with 5000. The client therefore honors the broker's inflated interval (per the STOMP spec) and sends a read check every 7500ms. The broker, meanwhile, applies the grace period multiplier, a second time, to its inflated read check interval. In this case, it now performs a read check every 11250ms (as shown in the DEBUG logging of the AbstractInactivityMonitor). With the client idle the broker's activemq.log contains TRACE log entries that read "A receive is in progress" every 11250ms. I presume this is the ReadCheckTimer handling the read check sent from the client? If so, the log output could be a little more descriptive.
          Hide
          Timothy Bish added a comment -

          submit some unit tests.

          Show
          Timothy Bish added a comment - submit some unit tests.
          Hide
          Paul Gale added a comment - - edited

          I have no idea how to write unit tests for ActiveMQ and this section of code in particular.

          Unfortunately I could not find any unit tests for this feature which is surprising given that it was added just the other day. How was it tested? It's a tad galling to be asked for some tests now.

          I don't mind modifying existing tests though. However, my personal experience with the unit test codebase is that they're rather flaky; they almost never pass whenever I've tried to run them which doesn't exactly entice me into wanting to try now.

          Nonetheless, from a quick analysis of the code it would appear that the offending code is in activemq-stomp/src/main/java/org/apache/activemq/transport/stomp/ProtocolConverter.java at line 929:

              hbReadInterval = (long) (Long.parseLong(keepAliveOpts[0]) * hbGracePeriodMultiplier); // Wrong!
          

          should be:

              hbReadInterval = (long) Long.parseLong(keepAliveOpts[0]); //  Honor the client's read interval
          

          where keepAliveOpts[0] is the client specified heartbeat read-interval.

          When the inactivity monitor's read check time is calculated it's done correctly:

          StompInactivityMonitor monitor = this.stompTransport.getInactivityMonitor();
          monitor.setReadCheckTime((long) (hbReadInterval * hbGracePeriodMultiplier));  // Correct
          monitor.setInitialDelayTime(Math.min(hbReadInterval, hbWriteInterval));
          monitor.setWriteCheckTime(hbWriteInterval);
          monitor.startMonitoring();
          
          Setup:    keepAliveOpts[0] = 5000, hbGracePeriodMultiplier = 1.5
          
          Expected: hbReadInterval == 5000, monitor.getReadCheckTime() == 7500
          Actual:   hbReadInterval == 7500, monitor.getReadCheckTime() == 11250
          
          Show
          Paul Gale added a comment - - edited I have no idea how to write unit tests for ActiveMQ and this section of code in particular. Unfortunately I could not find any unit tests for this feature which is surprising given that it was added just the other day. How was it tested? It's a tad galling to be asked for some tests now. I don't mind modifying existing tests though. However, my personal experience with the unit test codebase is that they're rather flaky; they almost never pass whenever I've tried to run them which doesn't exactly entice me into wanting to try now. Nonetheless, from a quick analysis of the code it would appear that the offending code is in activemq-stomp/src/main/java/org/apache/activemq/transport/stomp/ProtocolConverter.java at line 929: hbReadInterval = ( long ) ( Long .parseLong(keepAliveOpts[0]) * hbGracePeriodMultiplier); // Wrong! should be: hbReadInterval = ( long ) Long .parseLong(keepAliveOpts[0]); // Honor the client's read interval where keepAliveOpts [0] is the client specified heartbeat read-interval. When the inactivity monitor's read check time is calculated it's done correctly: StompInactivityMonitor monitor = this .stompTransport.getInactivityMonitor(); monitor.setReadCheckTime(( long ) (hbReadInterval * hbGracePeriodMultiplier)); // Correct monitor.setInitialDelayTime( Math .min(hbReadInterval, hbWriteInterval)); monitor.setWriteCheckTime(hbWriteInterval); monitor.startMonitoring(); Setup: keepAliveOpts[0] = 5000, hbGracePeriodMultiplier = 1.5 Expected: hbReadInterval == 5000, monitor.getReadCheckTime() == 7500 Actual: hbReadInterval == 7500, monitor.getReadCheckTime() == 11250
          Hide
          Timothy Bish added a comment -

          Already fixed earlier, should be in the latest nightly.

          Show
          Timothy Bish added a comment - Already fixed earlier, should be in the latest nightly.
          Charlie Mordant made changes -
          Link This issue is a clone of AMQ-5168 [ AMQ-5168 ]

            People

            • Assignee:
              Timothy Bish
              Reporter:
              Paul Gale
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development