Uploaded image for project: 'ActiveMQ'
  1. ActiveMQ
  2. AMQ-6666

Failover Transport - send timeout not working

    Details

    • Type: Bug
    • Status: Open
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: 5.14.0
    • Fix Version/s: None
    • Component/s: Broker
    • Labels:
      None

      Description

      Running into a situation with the Failover Transport not respecting the timeout that's been set. The symptom is endless messages of this kind:

      2017-04-29 09:48:26,128 | TRACE | .engine.cfgengine.in]-11 | FailoverTransport                | sport.failover.FailoverTransport 615 | 81 - org.apache.activemq.activemq-osgi - 5.14.0 | Waiting for transport to reconnect..: TransactionInfo {commandId = 127798, responseRequired = true, type = 7, connectionId = ID:inucdev4-57330-1493370444659-3:3, transactionId = XID:[1096044365,globalId=6374726c6366672d656e67696e653130333530323030303034,branchId=6374726c6366672d656e67696e6531313036383134]}
      2017-04-29 09:48:26,228 | TRACE | .engine.cfgengine.in]-11 | FailoverTransport                | sport.failover.FailoverTransport 615 | 81 - org.apache.activemq.activemq-osgi - 5.14.0 | Waiting for transport to reconnect..: TransactionInfo {commandId = 127798, responseRequired = true, type = 7, connectionId = ID:inucdev4-57330-1493370444659-3:3, transactionId = XID:[1096044365,globalId=6374726c6366672d656e67696e653130333530323030303034,branchId=6374726c6366672d656e67696e6531313036383134]}
      2017-04-29 09:48:26,329 | TRACE | .engine.cfgengine.in]-11 | FailoverTransport                | sport.failover.FailoverTransport 615 | 81 - org.apache.activemq.activemq-osgi - 5.14.0 | Waiting for transport to reconnect..: TransactionInfo {commandId = 127798, responseRequired = true, type = 7, connectionId = ID:inucdev4-57330-1493370444659-3:3, transactionId = XID:[1096044365,globalId=6374726c6366672d656e67696e653130333530323030303034,branchId=6374726c6366672d656e67696e6531313036383134]}
      2017-04-29 09:48:26,429 | TRACE | .engine.cfgengine.in]-11 | FailoverTransport                | sport.failover.FailoverTransport 615 | 81 - org.apache.activemq.activemq-osgi - 5.14.0 | Waiting for transport to reconnect..: TransactionInfo {commandId = 127798, responseRequired = true, type = 7, connectionId = ID:inucdev4-57330-1493370444659-3:3, transactionId = XID:[1096044365,globalId=6374726c6366672d656e67696e653130333530323030303034,branchId=6374726c6366672d656e67696e6531313036383134]}
      2017-04-29 09:48:26,530 | TRACE | .engine.cfgengine.in]-11 | FailoverTransport                | sport.failover.FailoverTransport 615 | 81 - org.apache.activemq.activemq-osgi - 5.14.0 | Waiting for transport to reconnect..: TransactionInfo {commandId = 127798, responseRequired = true, type = 7, connectionId = ID:inucdev4-57330-1493370444659-3:3, transactionId = XID:[1096044365,globalId=6374726c6366672d656e67696e653130333530323030303034,branchId=6374726c6366672d656e67696e6531313036383134]}
      ...
      2017-04-29 09:48:33,270 | TRACE | .engine.cfgengine.in]-11 | FailoverTransport                | sport.failover.FailoverTransport 615 | 81 - org.apache.activemq.activemq-osgi - 5.14.0 | Waiting for transport to reconnect..: TransactionInfo {commandId = 127798, responseRequired = true, type = 7, connectionId = ID:inucdev4-57330-1493370444659-3:3, transactionId = XID:[1096044365,globalId=6374726c6366672d656e67696e653130333530323030303034,branchId=6374726c6366672d656e67696e6531313036383134]}
      2017-04-29 09:48:33,371 | TRACE | .engine.cfgengine.in]-11 | FailoverTransport                | sport.failover.FailoverTransport 615 | 81 - org.apache.activemq.activemq-osgi - 5.14.0 | Waiting for transport to reconnect..: TransactionInfo {commandId = 127798, responseRequired = true, type = 7, connectionId = ID:inucdev4-57330-1493370444659-3:3, transactionId = XID:[1096044365,globalId=6374726c6366672d656e67696e653130333530323030303034,branchId=6374726c6366672d656e67696e6531313036383134]}
      

      The code seems to never get out of this loop:

          while (transport == null && !disposed && connectionFailure == null
                  && !Thread.currentThread().isInterrupted() && willReconnect()) {
      
              LOG.trace("Waiting for transport to reconnect..: {}", command);
              long end = System.currentTimeMillis();
              if (command.isMessage() && timeout > 0 && (end - start > timeout)) {
                  timedout = true;
                  LOG.info("Failover timed out after {} ms", (end - start));
                  break;
              }
              try {
                  reconnectMutex.wait(100);
              } catch (InterruptedException e) {
                  Thread.currentThread().interrupt();
                  LOG.debug("Interupted:", e);
              }
              transport = connectedTransport.get();
          }
      

      The timeout is set to 5000ms and should have hit a long time ago, but as "command.isMessage()" returns false, it endlessly stays inside the loop.

      The "command.isMessage()" condition should likely be removed.

      Currently running tests with a patched ActiveMQ, and the situation has improved and fail-over seems to have worked (mostly). (Only seeing an issue with a topic consumer that has not reconnected.)

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                lichtin Martin Lichtin
              • Votes:
                3 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated: