ActiveMQ
  1. ActiveMQ
  2. AMQ-2730

Transport failed: java.io.EOFException

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Not a Problem
    • Affects Version/s: 5.3.1
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Environment:

      Red hat enterprise

      Description

      We see many Exceptions of this type in the broker logs:
      2010-05-12 14:10:03,490 | DEBUG | Transport failed: java.io.EOFException | org.apache.activemq.broker.TransportConnection.Transport | ActiveMQ Transport: tcp:///10.231.233.55:50809
      java.io.EOFException
      at java.io.DataInputStream.readInt(DataInputStream.java:375)
      at org.apache.activemq.openwire.OpenWireFormat.unmarshal(OpenWireFormat.java:269)
      at org.apache.activemq.transport.tcp.TcpTransport.readCommand(TcpTransport.java:211)
      at org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:203)
      at org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:186)
      at java.lang.Thread.run(Thread.java:619)

      This exception occurs with a lot of clients that run on different machines.

      One case where we almost always see this exception is when shutting down a bridge. We did a tcp dump on the broker and on the client machine: there a no packets lost.

      LOG ON BRIDGE:
      2010-05-12 14:10:02,634 INFO org.apache.activemq.broker.BrokerService - ActiveMQ Message Broker (solr-2073bridge, ID:solr-2073-50804-
      1273665033782-0:0) is shutting down
      2010-05-12 14:10:02,799 INFO org.apache.activemq.network.jms.JmsConnector - JMS Connector Connector:0 Stopped
      2010-05-12 14:10:02,819 INFO org.apache.activemq.broker.BrokerService - ActiveMQ JMS Message Broker (solr-2073bridge, ID:solr-2073-50
      804-1273665033782-0:0) stopped

      LOG ON BROKER:
      2010-05-12 14:10:03,490 | DEBUG | Transport failed: java.io.EOFException | org.apache.activemq.broker.TransportConnection.Transport | ActiveMQ Transport: tcp:///10.231.233.55:50809
      java.io.EOFException
      at java.io.DataInputStream.readInt(DataInputStream.java:375)
      at org.apache.activemq.openwire.OpenWireFormat.unmarshal(OpenWireFormat.java:269)
      at org.apache.activemq.transport.tcp.TcpTransport.readCommand(TcpTransport.java:211)
      at org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:203)
      at org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:186)
      at java.lang.Thread.run(Thread.java:619)
      2010-05-12 14:10:03,492 | DEBUG | Stopping connection: /10.231.233.55:50809 | org.apache.activemq.broker.TransportConnection | ActiveMQ Task
      2010-05-12 14:10:03,492 | DEBUG | Stopping transport tcp:///10.231.233.55:50809 | org.apache.activemq.transport.tcp.TcpTransport | ActiveMQ Task
      2010-05-12 14:10:03,493 | DEBUG | Stopped transport: /10.231.233.55:50809 | org.apache.activemq.broker.TransportConnection | ActiveMQ Task
      2010-05-12 14:10:03,493 | DEBUG | Cleaning up connection resources: /10.231.233.55:50809 | org.apache.activemq.broker.TransportConnection | ActiveMQ Task
      2010-05-12 14:10:03,493 | DEBUG | remove connection id: ID:solr-2073-50804-1273665033782-2:0 | org.apache.activemq.broker.TransportConnection | ActiveMQ Task
      2010-05-12 14:10:03,493 | DEBUG | jms1 removing consumer: ID:solr-2073-50804-1273665033782-2:0:225:1 for destination: temp-queue://ID:solr-2073-50804-1273665033782-2:0:105 | org.apache.activemq.broker.region.AbstractRegion | ActiveMQ Task
      2010-05-12 14:10:03,493 | DEBUG | remove sub: QueueSubscription: consumer=ID:solr-2073-50804-1273665033782-2:0:225:1, destinations=1, dispatched=0, delivered=0, pending=0, lastDeliveredSeqId: 0, dequeues: 0, dispatched: 0, inflight: 0 | org.apache.activemq.broker.region.Queue | ActiveMQ Task

      1. TCP Dump.jpg
        606 kB
        Dennis Klinkott

        Activity

        Hide
        Dennis Klinkott added a comment -

        The screenshot shows the communication between the broker (ip: 10.231.233.125) and the bridge (solr-2073, ip: 10.231.233.55) at the time the exception occurred.

        Show
        Dennis Klinkott added a comment - The screenshot shows the communication between the broker (ip: 10.231.233.125) and the bridge (solr-2073, ip: 10.231.233.55) at the time the exception occurred.
        Hide
        Dennis Klinkott added a comment -

        The strange thing is that we get the EOFException shortly before the connection is stopped:

        2010-05-12 14:10:03,490 | DEBUG | Transport failed: java.io.EOFException

        2010-05-12 14:10:03,492 | DEBUG | Stopping connection

        Show
        Dennis Klinkott added a comment - The strange thing is that we get the EOFException shortly before the connection is stopped: 2010-05-12 14:10:03,490 | DEBUG | Transport failed: java.io.EOFException 2010-05-12 14:10:03,492 | DEBUG | Stopping connection
        Hide
        Bruce Snyder added a comment - - edited

        The EOFException is difficult to explain because there is no additional info pointing at what may have happened, i.e., it's simply an abrupt failure. The important question here is, 'What is causing the abrupt failure?'

        Just a few of questions:

        • Can you reproduce the errors?
          • If so, can you sniff the network (e.g., using Wireshark) to see what is happening just before the exception occurs?
        • Is there a client-side exception to match the exception on the broker side?
        • Is the broker running in a virtualized environment perhaps?
        Show
        Bruce Snyder added a comment - - edited The EOFException is difficult to explain because there is no additional info pointing at what may have happened, i.e., it's simply an abrupt failure. The important question here is, 'What is causing the abrupt failure?' Just a few of questions: Can you reproduce the errors? If so, can you sniff the network (e.g., using Wireshark) to see what is happening just before the exception occurs? Is there a client-side exception to match the exception on the broker side? Is the broker running in a virtualized environment perhaps?
        Hide
        IP added a comment -

        Same problem here. I tried versions 5.4.0 and 5.4.1.

        On client-side there is an exception:
        javax.jms.JMSException: Wire format negotiation timeout: peer did not send his wire format.
        at org.apache.activemq.util.JMSExceptionSupport.create (JMSExceptionSupport.java:62)
        org.apache.activemq.ActiveMQConnection.syncSendPacket (ActiveMQConnection.java:1298)
        org.apache.activemq.ActiveMQConnection.ensureConnectionInfoSent (ActiveMQConnection.java:1382)
        org.apache.activemq.ActiveMQConnection.createSession (ActiveMQConnection.java:309)
        sun.reflect.GeneratedMethodAccessor18.invoke (:-1)
        sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:25)
        java.lang.reflect.Method.invoke (Method.java:597)
        clojure.lang.Reflector.invokeMatchingMethod (Reflector.java:90)
        clojure.lang.Reflector.invokeInstanceMethod (Reflector.java:28)
        ...
        Caused by: java.io.IOException: Wire format negotiation timeout: peer did not send his wire format.
        at org.apache.activemq.transport.WireFormatNegotiator.oneway (WireFormatNegotiator.java:98)
        org.apache.activemq.transport.MutexTransport.oneway (MutexTransport.java:40)
        org.apache.activemq.transport.ResponseCorrelator.asyncRequest (ResponseCorrelator.java:81)
        org.apache.activemq.transport.ResponseCorrelator.request (ResponseCorrelator.java:86)
        org.apache.activemq.ActiveMQConnection.syncSendPacket (ActiveMQConnection.java:1276)
        org.apache.activemq.ActiveMQConnection.ensureConnectionInfoSent (ActiveMQConnection.java:1382)
        org.apache.activemq.ActiveMQConnection.createSession (ActiveMQConnection.java:309)
        sun.reflect.GeneratedMethodAccessor18.invoke (:-1)
        sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:25)
        java.lang.reflect.Method.invoke (Method.java:597)

        Show
        IP added a comment - Same problem here. I tried versions 5.4.0 and 5.4.1. On client-side there is an exception: javax.jms.JMSException: Wire format negotiation timeout: peer did not send his wire format. at org.apache.activemq.util.JMSExceptionSupport.create (JMSExceptionSupport.java:62) org.apache.activemq.ActiveMQConnection.syncSendPacket (ActiveMQConnection.java:1298) org.apache.activemq.ActiveMQConnection.ensureConnectionInfoSent (ActiveMQConnection.java:1382) org.apache.activemq.ActiveMQConnection.createSession (ActiveMQConnection.java:309) sun.reflect.GeneratedMethodAccessor18.invoke (:-1) sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:25) java.lang.reflect.Method.invoke (Method.java:597) clojure.lang.Reflector.invokeMatchingMethod (Reflector.java:90) clojure.lang.Reflector.invokeInstanceMethod (Reflector.java:28) ... Caused by: java.io.IOException: Wire format negotiation timeout: peer did not send his wire format. at org.apache.activemq.transport.WireFormatNegotiator.oneway (WireFormatNegotiator.java:98) org.apache.activemq.transport.MutexTransport.oneway (MutexTransport.java:40) org.apache.activemq.transport.ResponseCorrelator.asyncRequest (ResponseCorrelator.java:81) org.apache.activemq.transport.ResponseCorrelator.request (ResponseCorrelator.java:86) org.apache.activemq.ActiveMQConnection.syncSendPacket (ActiveMQConnection.java:1276) org.apache.activemq.ActiveMQConnection.ensureConnectionInfoSent (ActiveMQConnection.java:1382) org.apache.activemq.ActiveMQConnection.createSession (ActiveMQConnection.java:309) sun.reflect.GeneratedMethodAccessor18.invoke (:-1) sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:25) java.lang.reflect.Method.invoke (Method.java:597)
        Hide
        Bruce Snyder added a comment - - edited

        @IP, That error is different and can occur for a couple of reasons:

        1. The resources available to the broker are so consumed that there are not enough available to spawn the necessary thread to allow the handshake or the broker is so slow in creating the thread that a timeout occurs before the handshake.
          • The solution is to either free up resources or give the broker more resources (i.e., memory).
        2. A client connects to a port that is not an endpoint for the TCP transport. Even though the client connects to the broker, the full handshake is not completed and a timeout occurs.
          • The solution is to connect to the correct port number.

        Actually there is a FAQ entry for this problem that says basically the same things:

        http://activemq.apache.org/javaxjmsjmsexception-wire-format-negociation-timeout-peer-did-not-send-his-wire-format.html

        Show
        Bruce Snyder added a comment - - edited @IP, That error is different and can occur for a couple of reasons: The resources available to the broker are so consumed that there are not enough available to spawn the necessary thread to allow the handshake or the broker is so slow in creating the thread that a timeout occurs before the handshake. The solution is to either free up resources or give the broker more resources (i.e., memory). A client connects to a port that is not an endpoint for the TCP transport. Even though the client connects to the broker, the full handshake is not completed and a timeout occurs. The solution is to connect to the correct port number. Actually there is a FAQ entry for this problem that says basically the same things: http://activemq.apache.org/javaxjmsjmsexception-wire-format-negociation-timeout-peer-did-not-send-his-wire-format.html
        Hide
        Sergey Rem added a comment -

        Hi ,
        1. Do You mean configuration of memory in ApacheMQ in bin Directory? It looks like -it is not reason of problem?

        1. Set jvm memory configuration
          ACTIVEMQ_OPTS_MEMORY="-Xms256M -Xmx512M"

        2. What do You mean? It is Stomp configurationc for ApacheMQ: there is only Port 61612. How can client try to contact other ports? Is it possible?

        3. FAQ. Only one configuration's property works with STOMP protocol. so it doesn't work:
        <transportConnector name="stomp" uri="stomp://0.0.0.0:61612?transport.closeAsync=false&soTimeout=30000&wireFormat.maxInactivityDurationInitalDelay=30000"/>

        (only one property is possible, for example ....?wireFormat.maxInactivityDurationInitalDelay=30000"/> ):

        so by me it is not solved... or do i right follow You?

        Show
        Sergey Rem added a comment - Hi , 1. Do You mean configuration of memory in ApacheMQ in bin Directory? It looks like -it is not reason of problem? Set jvm memory configuration ACTIVEMQ_OPTS_MEMORY="-Xms256M -Xmx512M" 2. What do You mean? It is Stomp configurationc for ApacheMQ: there is only Port 61612. How can client try to contact other ports? Is it possible? 3. FAQ. Only one configuration's property works with STOMP protocol. so it doesn't work: <transportConnector name="stomp" uri="stomp://0.0.0.0:61612?transport.closeAsync=false&soTimeout=30000&wireFormat.maxInactivityDurationInitalDelay=30000"/> (only one property is possible, for example ....?wireFormat.maxInactivityDurationInitalDelay=30000"/> ): so by me it is not solved... or do i right follow You?
        Hide
        Sergey Rem added a comment -

        Hi,
        what can i configure in case of using Loadbalancer?
        It sends constantly check bytes to Port 61612...
        So exceptions come in Stomp Protocol -error of reading of line...
        What is right solution?

        Show
        Sergey Rem added a comment - Hi, what can i configure in case of using Loadbalancer? It sends constantly check bytes to Port 61612... So exceptions come in Stomp Protocol -error of reading of line... What is right solution?
        Hide
        Bruce Snyder added a comment -

        1. Do You mean configuration of memory in ApacheMQ in bin Directory? It looks like -it is not reason of problem?
        Set jvm memory configuration
        ACTIVEMQ_OPTS_MEMORY="-Xms256M -Xmx512M"

        Yes, one possibility is memory resources have been exhausted. By default, the ActiveMQ startup script allocates 512mb to the JVM in which ActiveMQ runs. In addition to the memory allocated to the JVM, you also need to configure the memory that is made available to the broker via the <systemUsage> element in the activemq.xml file. See the docs for the <systemUsage> element for more info:

        http://activemq.apache.org/producer-flow-control.html#ProducerFlowControl-Systemusage

        2. What do You mean? It is Stomp configurationc for ApacheMQ: there is only Port 61612. How can client try to contact other ports? Is it possible?

        I mean that the client may be connecting to a port that does not have a TCP transport connector registered to it. It is possible for a client to use TCP to connect to ActiveMQ via a port that has a non-TCP endpoint registered. However, the client will not get a full handshake and an error will be thrown.

        3. FAQ. Only one configuration's property works with STOMP protocol. so it doesn't work:
        <transportConnector name="stomp" uri="stomp://0.0.0.0:61612?transport.closeAsync=false&soTimeout=30000&wireFormat.maxInactivityDurationInitalDelay=30000"/>

        FAQ item #3 is referring to the failover transport connector which is used on the client-side, not the Stomp transport connector which is used on the broker-side. Here are the docs for the failover transport:

        http://activemq.apache.org/failover-transport-reference.html

        Hi,
        what can i configure in case of using Loadbalancer?
        It sends constantly check bytes to Port 61612...
        So exceptions come in Stomp Protocol -error of reading of line...
        What is right solution?

        Typically people have a TCP or NIO transport connector defined in the broker config for this purpose. The load balancer just opens a socket with the broker via the TCP transport and if it is able to connect, it assumes that the broker is up and running. More sophisticated setups that I have seen actually have the load balancer run a script that connects the broker and sends/receives a message on a destination to ensure that the broker is actually processing messages.

        Show
        Bruce Snyder added a comment - 1. Do You mean configuration of memory in ApacheMQ in bin Directory? It looks like -it is not reason of problem? Set jvm memory configuration ACTIVEMQ_OPTS_MEMORY="-Xms256M -Xmx512M" Yes, one possibility is memory resources have been exhausted. By default, the ActiveMQ startup script allocates 512mb to the JVM in which ActiveMQ runs. In addition to the memory allocated to the JVM, you also need to configure the memory that is made available to the broker via the <systemUsage> element in the activemq.xml file. See the docs for the <systemUsage> element for more info: http://activemq.apache.org/producer-flow-control.html#ProducerFlowControl-Systemusage 2. What do You mean? It is Stomp configurationc for ApacheMQ: there is only Port 61612. How can client try to contact other ports? Is it possible? I mean that the client may be connecting to a port that does not have a TCP transport connector registered to it. It is possible for a client to use TCP to connect to ActiveMQ via a port that has a non-TCP endpoint registered. However, the client will not get a full handshake and an error will be thrown. 3. FAQ. Only one configuration's property works with STOMP protocol. so it doesn't work: <transportConnector name="stomp" uri="stomp://0.0.0.0:61612?transport.closeAsync=false&soTimeout=30000&wireFormat.maxInactivityDurationInitalDelay=30000"/> FAQ item #3 is referring to the failover transport connector which is used on the client-side, not the Stomp transport connector which is used on the broker-side. Here are the docs for the failover transport: http://activemq.apache.org/failover-transport-reference.html Hi, what can i configure in case of using Loadbalancer? It sends constantly check bytes to Port 61612... So exceptions come in Stomp Protocol -error of reading of line... What is right solution? Typically people have a TCP or NIO transport connector defined in the broker config for this purpose. The load balancer just opens a socket with the broker via the TCP transport and if it is able to connect, it assumes that the broker is up and running. More sophisticated setups that I have seen actually have the load balancer run a script that connects the broker and sends/receives a message on a destination to ensure that the broker is actually processing messages.
        Hide
        Sergey Rem added a comment - - edited

        Thanks!
        Can be broker configured so that it will ignore check bytes of Loadbalncer to STOMP connector Port and will not write Exception in log file each second ?
        NIO has the same trouble if You meaned that NIO transport connector would not see check bytes of Loadbalancer as Exceptions by reading of STOMP messages...
        Sure java source can be changed to ignore some exceptions or such bytes, but it is not good solution - new versions Apache MQ need than such changes in case of update.

        {Typically people have a TCP or NIO transport connector defined in the broker config for this purpose. The load balancer just opens a socket with the broker via the TCP transport and if it is able to connect, it assumes that the broker is up and running. More sophisticated setups that I have seen actually have the load balancer run a script that connects the broker and sends/receives a message on a destination to ensure that the broker is actually processing messages. }
        Show
        Sergey Rem added a comment - - edited Thanks! Can be broker configured so that it will ignore check bytes of Loadbalncer to STOMP connector Port and will not write Exception in log file each second ? NIO has the same trouble if You meaned that NIO transport connector would not see check bytes of Loadbalancer as Exceptions by reading of STOMP messages... Sure java source can be changed to ignore some exceptions or such bytes, but it is not good solution - new versions Apache MQ need than such changes in case of update. {Typically people have a TCP or NIO transport connector defined in the broker config for this purpose. The load balancer just opens a socket with the broker via the TCP transport and if it is able to connect, it assumes that the broker is up and running. More sophisticated setups that I have seen actually have the load balancer run a script that connects the broker and sends/receives a message on a destination to ensure that the broker is actually processing messages. }
        Hide
        Sergey Rem added a comment -

        Hello Bruse,

        do You have any example of xml configuration file or its part (is it appachemq.xml or spring confi xml?) of adding own transporter listerner?
        so that this exception would be processed by own transport listener.
        i haven't found any example of xml.

        Thanks in advance!

        Show
        Sergey Rem added a comment - Hello Bruse, do You have any example of xml configuration file or its part (is it appachemq.xml or spring confi xml?) of adding own transporter listerner? so that this exception would be processed by own transport listener. i haven't found any example of xml. Thanks in advance!
        Hide
        Bruce Snyder added a comment -

        @Sergey, A transport listener can be set on the ActiveMQ connection factory to monitor the status of the failover transport from the client side, not from the broker side. Using this method, every connection that is created via this connection factory will have the transport listener enabled. Below is an example using a Spring XML config:

        <bean class="org.apache.activemq.ActiveMQConnectionFactory">
          <property name="brokerURL" value="failover:(tcp://localhost:61616)" />
          <property name="transportListener">
            <bean class="com.mycompany.MyTransportListener" />
          </property>
        </bean>
        

        A transport listener can also be set on the ActiveMQ connection directly, but this is usually done programmatically. Below is an example using Java code:

        ...
        Connection conn = connectionFactory.createConnection();
        conn.addTransportListener(new MyTransportListener() {
          public void onCommand(Object o) {
            System.out.println("onCommand invoked"); 
          }
        
          public void onException(IOException error) {
            System.out.println("onException invoked"); 
          }
        });
        ...
        

        A transport listener can also be added directly to a Transport, but this is only done when working directly with a Transport. Below is an example of doing this using Java code:

        ...
        transport.setTransportListener(new DefaultTransportListener() {
          public void onCommand(Object o) {
            System.out.println("onCommand invoked"); 
          }
        
          public void onException(IOException error) {
            System.out.println("onException invoked"); 
          }
        });
        ...
        

        I hope these examples help answer your questions.

        Show
        Bruce Snyder added a comment - @Sergey, A transport listener can be set on the ActiveMQ connection factory to monitor the status of the failover transport from the client side, not from the broker side. Using this method, every connection that is created via this connection factory will have the transport listener enabled. Below is an example using a Spring XML config: <bean class= "org.apache.activemq.ActiveMQConnectionFactory" > <property name= "brokerURL" value= "failover:(tcp://localhost:61616)" /> <property name= "transportListener" > <bean class= "com.mycompany.MyTransportListener" /> </property> </bean> A transport listener can also be set on the ActiveMQ connection directly, but this is usually done programmatically. Below is an example using Java code: ... Connection conn = connectionFactory.createConnection(); conn.addTransportListener( new MyTransportListener() { public void onCommand( Object o) { System .out.println( "onCommand invoked" ); } public void onException(IOException error) { System .out.println( "onException invoked" ); } }); ... A transport listener can also be added directly to a Transport , but this is only done when working directly with a Transport . Below is an example of doing this using Java code: ... transport.setTransportListener( new DefaultTransportListener() { public void onCommand( Object o) { System .out.println( "onCommand invoked" ); } public void onException(IOException error) { System .out.println( "onException invoked" ); } }); ... I hope these examples help answer your questions.
        Hide
        Scott Emmons added a comment -

        Related to the ORIGINAL description of this bug, here's our experience in debugging EOFException errors with AMQ 5.4.2 on linux, using a failover connector:

        We just spent some time debugging a similar EOFException situation and believe the cause is memory related on the ActiveMQ server, even though the server may not ever throw an OutOfMemory exception. In our case, this cause was not the least bit obvious as the server would appear to function fine, but at some point the inactivity timer would kick in and cause reconnects. When this would happen, new connections (from our Java clients) to the server would be immediately closed and the client would start hammering the server up until the max reconnect attempts is hit (even though the documentation says the default is unlimited, it's actually 1000 in TaskRunnerFactory - due to DefaultThreadPools having what we think is the right code commented out, unless that means something different). Because of synchronization in this code, a single thread will hit this loop, while all the other threads are stuck waiting on the mutex (which is fine, because nobody can do any work anyway). Once the 1000 connection attempts are done, the client JVM becomes effectively useless for any JMS activity, as the thread in the reconnection loop will give up and end up waiting forever on an event that will never occur. All other threads are stuck waiting on this mutex.

        We have seen many cases where restarting the client appears fine (it reconnects to ActiveMQ server just fine), giving a false sense that all is well. At some point, the client will once again get into this state. Because of this, we

        While there is little the client code can do in this situation, there are things that can be done to help with this situation. Here's my suggestions, if it helps:

        1. In the TaskRunnerFactory, if we hit the max reconnect attempts, a high severity log entry should be written saying that AMQ client is effectively dead and giving up. This message should repeat at some reasonable interval so it doesn't get buried in logs.

        2. Improve AMQ client logging when it is this reconnect loop. We found that we had to enable TRACE level logging to really get enough detail to see what was going on.

        3. Improve AMQ server logging, if possible, to indicate why it is closing sockets from the client.

        4. Validate if the current code in DefaultThreadPools is indeed correct (not the commented out code).

        5. I don't know what the default memoryUsage limit is in the example activemq.xml, but make sure that it's a sane value considering the default max heap is 512MB.

        Show
        Scott Emmons added a comment - Related to the ORIGINAL description of this bug, here's our experience in debugging EOFException errors with AMQ 5.4.2 on linux, using a failover connector: We just spent some time debugging a similar EOFException situation and believe the cause is memory related on the ActiveMQ server, even though the server may not ever throw an OutOfMemory exception. In our case, this cause was not the least bit obvious as the server would appear to function fine, but at some point the inactivity timer would kick in and cause reconnects. When this would happen, new connections (from our Java clients) to the server would be immediately closed and the client would start hammering the server up until the max reconnect attempts is hit (even though the documentation says the default is unlimited, it's actually 1000 in TaskRunnerFactory - due to DefaultThreadPools having what we think is the right code commented out, unless that means something different). Because of synchronization in this code, a single thread will hit this loop, while all the other threads are stuck waiting on the mutex (which is fine, because nobody can do any work anyway). Once the 1000 connection attempts are done, the client JVM becomes effectively useless for any JMS activity, as the thread in the reconnection loop will give up and end up waiting forever on an event that will never occur. All other threads are stuck waiting on this mutex. We have seen many cases where restarting the client appears fine (it reconnects to ActiveMQ server just fine), giving a false sense that all is well. At some point, the client will once again get into this state. Because of this, we While there is little the client code can do in this situation, there are things that can be done to help with this situation. Here's my suggestions, if it helps: 1. In the TaskRunnerFactory, if we hit the max reconnect attempts, a high severity log entry should be written saying that AMQ client is effectively dead and giving up. This message should repeat at some reasonable interval so it doesn't get buried in logs. 2. Improve AMQ client logging when it is this reconnect loop. We found that we had to enable TRACE level logging to really get enough detail to see what was going on. 3. Improve AMQ server logging, if possible, to indicate why it is closing sockets from the client. 4. Validate if the current code in DefaultThreadPools is indeed correct (not the commented out code). 5. I don't know what the default memoryUsage limit is in the example activemq.xml, but make sure that it's a sane value considering the default max heap is 512MB.
        Hide
        Scott Emmons added a comment -

        In my previous comment, I forgot a couple of important details.

        First, the memory condition was a result of having memoryUsage limit of "512 mb" with a max heap of 512MB due to a misconfiguration in this particular environment.

        Second, stack trace for the java.io.EOFException:

        java.io.DataInputStream.readInt(DataInputStream.java:358)
        org.apache.activemq.openwire.OpenWireFormat.unmarshal(OpenWireFormat.java:269)
        org.apache.activemq.transport.tcp.TcpTransport.readCommand(TcpTransport.java:227)
        org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:219)
        org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:202)
        java.lang.Thread.run(Thread.java:595)

        Show
        Scott Emmons added a comment - In my previous comment, I forgot a couple of important details. First, the memory condition was a result of having memoryUsage limit of "512 mb" with a max heap of 512MB due to a misconfiguration in this particular environment. Second, stack trace for the java.io.EOFException: java.io.DataInputStream.readInt(DataInputStream.java:358) org.apache.activemq.openwire.OpenWireFormat.unmarshal(OpenWireFormat.java:269) org.apache.activemq.transport.tcp.TcpTransport.readCommand(TcpTransport.java:227) org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:219) org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:202) java.lang.Thread.run(Thread.java:595)
        Hide
        Alex Soto added a comment -

        Running into this exact same problem, sometimes resulting in "Too many open files" error that really renders the server unusable. Adjusting memory settings as suggested by Scott Emmons seem to help with the "Too many open files" errors, but I am still seeing the "Transport failed: java.io.EOFException" errors in the server. I am afraid this error will cause some other problem eventually. Please help!

        Show
        Alex Soto added a comment - Running into this exact same problem, sometimes resulting in "Too many open files" error that really renders the server unusable. Adjusting memory settings as suggested by Scott Emmons seem to help with the "Too many open files" errors, but I am still seeing the "Transport failed: java.io.EOFException" errors in the server. I am afraid this error will cause some other problem eventually. Please help!
        Hide
        Scott Emmons added a comment -

        A couple of comments on this...

        1 - TaskRunnerFactory bug mentioned above has not, to my knowledge been fixed. This is a rather serious condition in that ActiveMQ will only try 1000 times (regardless of how configured) if the configuration is as described above, and then give up - silently and forever wedging the connection.

        2. Since my comments a few months back, we have discovered an additional, and very common source of EOFExceptions - and it's not a problem, but the logging of ActiveMQ is sort of misleading. Whenever a client closes a JMS connection under normal conditions, the server will report an EOFException. This also can happen in a network of brokers after a connection is idle for some period of time. Unfortunately, the logging does not disambiguate EOFExceptions as a result of normal connection shutdown (it could, if the client reported this fact to the server). On our spring-based clients we see this alot, because it creates/destroys JMS connections - but I don't know if this is due to bad code or configuration on our part, or is a limitation of spring.

        Hope this helps lead in the right direction...

        Show
        Scott Emmons added a comment - A couple of comments on this... 1 - TaskRunnerFactory bug mentioned above has not, to my knowledge been fixed. This is a rather serious condition in that ActiveMQ will only try 1000 times (regardless of how configured) if the configuration is as described above, and then give up - silently and forever wedging the connection. 2. Since my comments a few months back, we have discovered an additional, and very common source of EOFExceptions - and it's not a problem, but the logging of ActiveMQ is sort of misleading. Whenever a client closes a JMS connection under normal conditions, the server will report an EOFException. This also can happen in a network of brokers after a connection is idle for some period of time. Unfortunately, the logging does not disambiguate EOFExceptions as a result of normal connection shutdown (it could, if the client reported this fact to the server). On our spring-based clients we see this alot, because it creates/destroys JMS connections - but I don't know if this is due to bad code or configuration on our part, or is a limitation of spring. Hope this helps lead in the right direction...
        Hide
        LinZuXiong added a comment - - edited

        Transport failed: java.io.EOFException
        ActiveMQ 5.5, too.

        And ActiveMQ cannot restore.
        cd $

        {mq_dir}

        /bin
        way 1: ./activemq stop
        then ./activemq start
        way 2: ./activemq restart

        when every client connect the broker , the ActiveMQ server print Transport failed: java.io.EOFException...


        And then I found a detail , after a while , such two minute or more,
        the client , the ActiveMQ server doesnot print Transport failed . Why no Exception???

        Show
        LinZuXiong added a comment - - edited Transport failed: java.io.EOFException ActiveMQ 5.5, too. And ActiveMQ cannot restore. cd $ {mq_dir} /bin way 1: ./activemq stop then ./activemq start way 2: ./activemq restart when every client connect the broker , the ActiveMQ server print Transport failed: java.io.EOFException... – And then I found a detail , after a while , such two minute or more, the client , the ActiveMQ server doesnot print Transport failed . Why no Exception???
        Hide
        Jeronimo Azevedo added a comment -

        I am out of office statring monday 08 august till friday 19 august.
        For any APS/WebServices please contact Jerome Neumanns.
        For any other questions you can contact bb-staff@ept.lu

        Show
        Jeronimo Azevedo added a comment - I am out of office statring monday 08 august till friday 19 august. For any APS/WebServices please contact Jerome Neumanns. For any other questions you can contact bb-staff@ept.lu
        Hide
        Timothy Bish added a comment -

        These sorts or error messages are normal when the connections are broken between client and server.

        Show
        Timothy Bish added a comment - These sorts or error messages are normal when the connections are broken between client and server.
        Hide
        Gary Tully added a comment -

        spotted the comment above about the task runner and build a quick sanity test. When the max iterations is exceeded the task is enqueued if not done and the reconnects can continue. http://svn.apache.org/viewvc?rev=1208377&view=rev

        Show
        Gary Tully added a comment - spotted the comment above about the task runner and build a quick sanity test. When the max iterations is exceeded the task is enqueued if not done and the reconnects can continue. http://svn.apache.org/viewvc?rev=1208377&view=rev
        Hide
        siddeshwar_hiremath added a comment -

        2014-12-31 05:41:38,396 | DEBUG | Running WriteCheck[tcp://127.0.0.1:52513] | org.apache.activemq.transport.AbstractInactivityMonitor | ActiveMQ InactivityMonitor Worker
        2014-12-31 05:41:40,132 | DEBUG | 30000ms elapsed since last read check. | org.apache.activemq.transport.AbstractInactivityMonitor | ActiveMQ InactivityMonitor ReadCheckTimer
        2014-12-31 05:41:40,134 | DEBUG | WriteChecker: 9999ms elapsed since last write check. | org.apache.activemq.transport.AbstractInactivityMonitor | ActiveMQ InactivityMonitor WriteCheckTimer
        2014-12-31 05:41:40,135 | DEBUG | Running WriteCheck[tcp://127.0.0.1:52516] | org.apache.activemq.transport.AbstractInactivityMonitor | ActiveMQ InactivityMonitor Worker
        2014-12-31 05:41:41,008 | DEBUG | Transport Connection to: tcp://127.0.0.1:52770 failed: java.io.EOFException | org.apache.activemq.broker.TransportConnection.Transport | ActiveMQ NIO Worker 2
        java.io.EOFException
        at org.apache.activemq.transport.nio.NIOTransport.serviceRead(NIOTransport.java:98)[activemq-client-5.10.0.jar:5.10.0]
        at org.apache.activemq.transport.nio.NIOTransport$1.onSelect(NIOTransport.java:69)[activemq-client-5.10.0.jar:5.10.0]
        at org.apache.activemq.transport.nio.SelectorSelection.onSelect(SelectorSelection.java:94)[activemq-client-5.10.0.jar:5.10.0]
        at org.apache.activemq.transport.nio.SelectorWorker$1.run(SelectorWorker.java:119)[activemq-client-5.10.0.jar:5.10.0]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)[:1.7.0_40]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)[:1.7.0_40]
        at java.lang.Thread.run(Thread.java:724)[:1.7.0_40]
        2014-12-31 05:41:41,009 | DEBUG | Unregistering MBean org.apache.activemq:type=Broker,brokerName=localhost,connector=clientConnectors,connectorName=nio_//localhost_61616&qe;transport.keepAlive&true&transport.wireFormat.maxFrameSize&104857600&useQueueForAccept&false,connectionViewType=remoteAddress,connectionName=tcp_//127.0.0.1_52770 | org.apache.activemq.broker.jmx.ManagementContext | ActiveMQ NIO Worker 2
        2014-12-31 05:41:41,013 | DEBUG | Stopping connection: tcp://127.0.0.1:52770 | org.apache.activemq.broker.TransportConnection | ActiveMQ BrokerService[localhost] Task-3
        2014-12-31 05:41:41,015 | DEBUG | Stopping transport tcp:///127.0.0.1:52770@61616 | org.apache.activemq.transport.tcp.TcpTransport | ActiveMQ BrokerService[localhost] Task-3
        2014-12-31 05:41:41,017 | DEBUG | Initialized TaskRunnerFactory[ActiveMQ Task] using ExecutorService: java.util.concurrent.ThreadPoolExecutor@6721ffbf[Running, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0] | org.apache.activemq.thread.TaskRunnerFactory | ActiveMQ BrokerService[localhost] Task-3
        2014-12-31 05:41:41,018 | DEBUG | Closed socket Socket[unconnected] | org.apache.activemq.transport.tcp.TcpTransport | ActiveMQ Task-1
        2014-12-31 05:41:41,020 | DEBUG | Forcing shutdown of ExecutorService: java.util.concurrent.ThreadPoolExecutor@6721ffbf[Running, pool size = 1, active threads = 1, queued tasks = 0, completed tasks = 0] | org.apache.activemq.util.ThreadPoolUtils | ActiveMQ BrokerService[localhost] Task-3
        2014-12-31 05:41:41,020 | DEBUG | Stopped transport: null | org.apache.activemq.broker.TransportConnection | ActiveMQ BrokerService[localhost] Task-3
        2014-12-31 05:41:41,020 | DEBUG | Connection Stopped: null | org.apache.activemq.broker.TransportConnection | ActiveMQ BrokerService[localhost] Task-3
        2014-12-31 05:41:42,629 | DEBUG | Checkpoint started. | org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal Checkpoint Worker

        Show
        siddeshwar_hiremath added a comment - 2014-12-31 05:41:38,396 | DEBUG | Running WriteCheck [tcp://127.0.0.1:52513] | org.apache.activemq.transport.AbstractInactivityMonitor | ActiveMQ InactivityMonitor Worker 2014-12-31 05:41:40,132 | DEBUG | 30000ms elapsed since last read check. | org.apache.activemq.transport.AbstractInactivityMonitor | ActiveMQ InactivityMonitor ReadCheckTimer 2014-12-31 05:41:40,134 | DEBUG | WriteChecker: 9999ms elapsed since last write check. | org.apache.activemq.transport.AbstractInactivityMonitor | ActiveMQ InactivityMonitor WriteCheckTimer 2014-12-31 05:41:40,135 | DEBUG | Running WriteCheck [tcp://127.0.0.1:52516] | org.apache.activemq.transport.AbstractInactivityMonitor | ActiveMQ InactivityMonitor Worker 2014-12-31 05:41:41,008 | DEBUG | Transport Connection to: tcp://127.0.0.1:52770 failed: java.io.EOFException | org.apache.activemq.broker.TransportConnection.Transport | ActiveMQ NIO Worker 2 java.io.EOFException at org.apache.activemq.transport.nio.NIOTransport.serviceRead(NIOTransport.java:98) [activemq-client-5.10.0.jar:5.10.0] at org.apache.activemq.transport.nio.NIOTransport$1.onSelect(NIOTransport.java:69) [activemq-client-5.10.0.jar:5.10.0] at org.apache.activemq.transport.nio.SelectorSelection.onSelect(SelectorSelection.java:94) [activemq-client-5.10.0.jar:5.10.0] at org.apache.activemq.transport.nio.SelectorWorker$1.run(SelectorWorker.java:119) [activemq-client-5.10.0.jar:5.10.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [:1.7.0_40] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [:1.7.0_40] at java.lang.Thread.run(Thread.java:724) [:1.7.0_40] 2014-12-31 05:41:41,009 | DEBUG | Unregistering MBean org.apache.activemq:type=Broker,brokerName=localhost,connector=clientConnectors,connectorName=nio_//localhost_61616&qe;transport.keepAlive&true&transport.wireFormat.maxFrameSize&104857600&useQueueForAccept&false,connectionViewType=remoteAddress,connectionName=tcp_//127.0.0.1_52770 | org.apache.activemq.broker.jmx.ManagementContext | ActiveMQ NIO Worker 2 2014-12-31 05:41:41,013 | DEBUG | Stopping connection: tcp://127.0.0.1:52770 | org.apache.activemq.broker.TransportConnection | ActiveMQ BrokerService [localhost] Task-3 2014-12-31 05:41:41,015 | DEBUG | Stopping transport tcp:///127.0.0.1:52770@61616 | org.apache.activemq.transport.tcp.TcpTransport | ActiveMQ BrokerService [localhost] Task-3 2014-12-31 05:41:41,017 | DEBUG | Initialized TaskRunnerFactory [ActiveMQ Task] using ExecutorService: java.util.concurrent.ThreadPoolExecutor@6721ffbf [Running, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0] | org.apache.activemq.thread.TaskRunnerFactory | ActiveMQ BrokerService [localhost] Task-3 2014-12-31 05:41:41,018 | DEBUG | Closed socket Socket [unconnected] | org.apache.activemq.transport.tcp.TcpTransport | ActiveMQ Task-1 2014-12-31 05:41:41,020 | DEBUG | Forcing shutdown of ExecutorService: java.util.concurrent.ThreadPoolExecutor@6721ffbf [Running, pool size = 1, active threads = 1, queued tasks = 0, completed tasks = 0] | org.apache.activemq.util.ThreadPoolUtils | ActiveMQ BrokerService [localhost] Task-3 2014-12-31 05:41:41,020 | DEBUG | Stopped transport: null | org.apache.activemq.broker.TransportConnection | ActiveMQ BrokerService [localhost] Task-3 2014-12-31 05:41:41,020 | DEBUG | Connection Stopped: null | org.apache.activemq.broker.TransportConnection | ActiveMQ BrokerService [localhost] Task-3 2014-12-31 05:41:42,629 | DEBUG | Checkpoint started. | org.apache.activemq.store.kahadb.MessageDatabase | ActiveMQ Journal Checkpoint Worker
        Hide
        siddeshwar_hiremath added a comment -

        java.io.EOFException

        see the above log trace for complete detail.

        Show
        siddeshwar_hiremath added a comment - java.io.EOFException see the above log trace for complete detail.

          People

          • Assignee:
            Unassigned
            Reporter:
            Dennis Klinkott
          • Votes:
            5 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development