ActiveMQ
  1. ActiveMQ
  2. AMQ-1873

Stomp connections doesn't seem to be released cleanly (Too many open files error)

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Duplicate
    • Affects Version/s: 5.1.0
    • Fix Version/s: NEEDS_REVIEW
    • Component/s: None
    • Labels:
      None
    • Environment:

      Linux RHEL 4
      java version "1.5.0_15"
      Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_15-b04)
      Java HotSpot(TM) Server VM (build 1.5.0_15-b04, mixed mode)

      Description

      I'm actually testing ActiveMQ with a python stomp Client (stomp.py http://www.briggs.net.nz/log/projects/stomppy/).

      I'm trying to load the broker with ~200 producers (on ~200 hosts) feeding one TOPIC for only one consumer.

      Everything is working almost fine until the client connection fails.

      The ActiveMQ log displays :
      ...
      2008-07-30 09:28:43,665 [localhost:61613] ERROR TransportConnector - Could not accept connection : Too many open files
      2008-07-30 09:28:43,665 [localhost:61613] DEBUG TransportConnector - Reason: Too many open files
      java.net.SocketException: Too many open files
      at java.net.PlainSocketImpl.socketAccept(Native Method)
      at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:384)
      at java.net.ServerSocket.implAccept(ServerSocket.java:450)
      at java.net.ServerSocket.accept(ServerSocket.java:421)
      at org.apache.activemq.transport.tcp.TcpTransportServer.run(TcpTransportServer.java:221)
      at java.lang.Thread.run(Thread.java:595)

      When I look at the open files (or connections) (with lsof) I see 65446 lines like :

      java 26765 root *066u sock 0,4 7534034 can't identify protocol

      This seems to indicate that ActiveMQ doesn't totally released the UNIX socket

      I'm attaching the activemq.xml conf file.

        Activity

        Hide
        Alex Hollerith added a comment -

        config:
        <policyEntry queue=">" producerFlowControl="true" memoryLimit="5mb">

        setup:
        1 perl stomp producer producing into a queue,
        connecting and disconnecting on every post,
        rather low frequency of posts (0.5/minute)
        0 consumers

        behaviour:
        works OK until around 68 messages are in the queue (surely depends on the size of the messages)

        after that you get this in the log:
        2009-10-29 20:32:05,189 | INFO | Usage Manager memory limit reached on queue://test.soccerfeeds.queue. Producers will be throttled to the rate at which messages are removed <...>

        And while the activemq service is in that "throttling producers" state you will see CLOSE_WAIT sockets building up:
        tcp 0 0 ::ffff:10.60.1.51:61613 ::ffff:10.60.1.206:41519 CLOSE_WAIT
        tcp 0 0 ::ffff:10.60.1.51:61613 ::ffff:10.60.1.206:36141 CLOSE_WAIT
        tcp 0 0 ::ffff:10.60.1.51:61613 ::ffff:10.60.1.206:45840 CLOSE_WAIT
        tcp 0 0 ::ffff:10.60.1.51:61613 ::ffff:10.60.1.206:43793 CLOSE_WAIT
        tcp 0 0 ::ffff:10.60.1.51:61613 ::ffff:10.60.1.206:40212 CLOSE_WAIT
        tcp 0 0 ::ffff:10.60.1.51:61613 ::ffff:10.60.1.206:44060 CLOSE_WAIT
        tcp 0 0 ::ffff:10.60.1.51:61613 ::ffff:10.60.1.206:43776 CLOSE_WAIT
        tcp 0 0 ::ffff:10.60.1.51:61613 ::ffff:10.60.1.206:44032 CLOSE_WAIT
        tcp 0 0 ::ffff:10.60.1.51:61613 ::ffff:10.60.1.206:43781 CLOSE_WAIT
        tcp 0 0 ::ffff:10.60.1.51:61613 ::ffff:10.60.1.206:40200 CLOSE_WAIT
        tcp 0 0 ::ffff:10.60.1.51:61613 ::ffff:10.60.1.206:44045 CLOSE_WAIT

        You can watch the numbers grow with:
        watch --interval=5 'netstat -an |grep tcp |grep 61613 | grep CLOSE_WAIT |wc -l'

        Every post increases the number of CLOSE_WAIT sockets by 1. And the sockets will not go away, the number seems to be steadily growing.

        Just consume one single message (we did it via the admin webinterface) and the number of sockets in CLOSE_WAIT drops to 0 instantly.

        [root@bladedist01 activemq]# netstat -an |grep tcp |grep 61613
        tcp 0 0 :::61613 :::* LISTEN

        Our theory is that activemq does somehow manage to build up sockets in CLOSE_WAIT state while it is "throttling producers" mode on a given queue until, eventually, the system runs out of resources (file descriptors in this case).

        Show
        Alex Hollerith added a comment - config: <policyEntry queue=">" producerFlowControl="true" memoryLimit="5mb"> setup: 1 perl stomp producer producing into a queue, connecting and disconnecting on every post, rather low frequency of posts (0.5/minute) 0 consumers behaviour: works OK until around 68 messages are in the queue (surely depends on the size of the messages) after that you get this in the log: 2009-10-29 20:32:05,189 | INFO | Usage Manager memory limit reached on queue://test.soccerfeeds.queue. Producers will be throttled to the rate at which messages are removed <...> And while the activemq service is in that "throttling producers" state you will see CLOSE_WAIT sockets building up: tcp 0 0 ::ffff:10.60.1.51:61613 ::ffff:10.60.1.206:41519 CLOSE_WAIT tcp 0 0 ::ffff:10.60.1.51:61613 ::ffff:10.60.1.206:36141 CLOSE_WAIT tcp 0 0 ::ffff:10.60.1.51:61613 ::ffff:10.60.1.206:45840 CLOSE_WAIT tcp 0 0 ::ffff:10.60.1.51:61613 ::ffff:10.60.1.206:43793 CLOSE_WAIT tcp 0 0 ::ffff:10.60.1.51:61613 ::ffff:10.60.1.206:40212 CLOSE_WAIT tcp 0 0 ::ffff:10.60.1.51:61613 ::ffff:10.60.1.206:44060 CLOSE_WAIT tcp 0 0 ::ffff:10.60.1.51:61613 ::ffff:10.60.1.206:43776 CLOSE_WAIT tcp 0 0 ::ffff:10.60.1.51:61613 ::ffff:10.60.1.206:44032 CLOSE_WAIT tcp 0 0 ::ffff:10.60.1.51:61613 ::ffff:10.60.1.206:43781 CLOSE_WAIT tcp 0 0 ::ffff:10.60.1.51:61613 ::ffff:10.60.1.206:40200 CLOSE_WAIT tcp 0 0 ::ffff:10.60.1.51:61613 ::ffff:10.60.1.206:44045 CLOSE_WAIT You can watch the numbers grow with: watch --interval=5 'netstat -an |grep tcp |grep 61613 | grep CLOSE_WAIT |wc -l' Every post increases the number of CLOSE_WAIT sockets by 1. And the sockets will not go away, the number seems to be steadily growing. Just consume one single message (we did it via the admin webinterface) and the number of sockets in CLOSE_WAIT drops to 0 instantly. [root@bladedist01 activemq] # netstat -an |grep tcp |grep 61613 tcp 0 0 :::61613 :::* LISTEN Our theory is that activemq does somehow manage to build up sockets in CLOSE_WAIT state while it is "throttling producers" mode on a given queue until, eventually, the system runs out of resources (file descriptors in this case).
        Hide
        Ruben Wagner added a comment -

        We can reproduce this issue on a Linux platform which runs under a high load. One client is sufficient to produce the stale sockets.
        The client has to do something like (perl):

        while(1)
        {
            my $c = Stomp::Connection->new(host=>"localhost", port=>61613 );
            $c->Send( "/topic/foo", "hi5" );
            $c->Disconnect();
        }
        

        We have reproduced the problem with the following transport connectors:

        • stomp (Perl, PHP)
        • stomp+nio (Perl, PHP)
        • openwire (Java)

        We have reproduced the problem with the following builds: 5.3.2, 5.3.1, 5.4-SNAPSHOT (we did not try the others)

        It looks like a race condition to me, as the problem occures more often, the weaker the running machine is. It does not occure when I add a short sleep to my test script.

        Testing environments:
        #1:
        Linux version 2.6.26.8+pata
        java version "1.6.0_12"
        Java(TM) SE Runtime Environment (build 1.6.0_12-b04)
        Java HotSpot(TM) 64-Bit Server VM (build 11.2-b01, mixed mode)

        #2:
        Linux version 2.6.32-23-generic-pae
        java version "1.6.0_20"
        Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
        Java HotSpot(TM) Server VM (build 16.3-b01, mixed mode)

        P.S.
        excuse my english, I'm not a native speaker

        Show
        Ruben Wagner added a comment - We can reproduce this issue on a Linux platform which runs under a high load. One client is sufficient to produce the stale sockets. The client has to do something like (perl): while (1) { my $c = Stomp::Connection-> new (host=> "localhost" , port=>61613 ); $c->Send( "/topic/foo" , "hi5" ); $c->Disconnect(); } We have reproduced the problem with the following transport connectors: stomp (Perl, PHP) stomp+nio (Perl, PHP) openwire (Java) We have reproduced the problem with the following builds: 5.3.2, 5.3.1, 5.4-SNAPSHOT (we did not try the others) It looks like a race condition to me, as the problem occures more often, the weaker the running machine is. It does not occure when I add a short sleep to my test script. Testing environments: #1: Linux version 2.6.26.8+pata java version "1.6.0_12" Java(TM) SE Runtime Environment (build 1.6.0_12-b04) Java HotSpot(TM) 64-Bit Server VM (build 11.2-b01, mixed mode) #2: Linux version 2.6.32-23-generic-pae java version "1.6.0_20" Java(TM) SE Runtime Environment (build 1.6.0_20-b02) Java HotSpot(TM) Server VM (build 16.3-b01, mixed mode) P.S. excuse my english, I'm not a native speaker
        Hide
        Ryan Smith added a comment -

        Is this the same issue as AMQ-1739 ?

        Show
        Ryan Smith added a comment - Is this the same issue as AMQ-1739 ?
        Hide
        Timothy Bish added a comment -

        This is in some ways related to AMQ-3393. There is also the fact that Stomp 1.0 doesn't allow for a keep alive so sometimes connection drops aren't detected.

        Reopen if the problem persists.

        Show
        Timothy Bish added a comment - This is in some ways related to AMQ-3393 . There is also the fact that Stomp 1.0 doesn't allow for a keep alive so sometimes connection drops aren't detected. Reopen if the problem persists.

          People

          • Assignee:
            Unassigned
            Reporter:
            Julien Devemy
          • Votes:
            8 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development