Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.5
    • Fix Version/s: 0.5
    • Component/s: proton-c
    • Labels:
      None
    • Environment:
      osx

      Description

      If a peer closes the socket at an inopportune time poll() will start returning POLLHUP but not POLLERR. this drives messenger into a busyloop as the driver does not check this flag.

      The messenger instance is still able to service other connections but it's doing so at 100% cpu load as every poll() call returns immediately.

      1. 0001-Handle-POLLHUP-as-pending-io.patch
        2 kB
        Bozo Dragojevic
      2. 0001-Handle-POLLHUP-as-POLLERR.patch
        1 kB
        Bozo Dragojevic
      3. confuse-driver.py
        4 kB
        Bozo Dragojevic

        Activity

        Ted Ross made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Fix Version/s 0.5 [ 12324004 ]
        Resolution Fixed [ 1 ]
        Hide
        ASF subversion and git services added a comment -

        Commit 1510573 from Ted Ross in branch 'proton/trunk'
        [ https://svn.apache.org/r1510573 ]

        PROTON-372 - Add better handing of poll status values in the driver.
        Applied patch from Bozo Dragojevic.

        Show
        ASF subversion and git services added a comment - Commit 1510573 from Ted Ross in branch 'proton/trunk' [ https://svn.apache.org/r1510573 ] PROTON-372 - Add better handing of poll status values in the driver. Applied patch from Bozo Dragojevic.
        Ted Ross made changes -
        Assignee Ted Ross [ tedross ]
        Bozo Dragojevic made changes -
        Attachment 0001-Handle-POLLHUP-as-pending-io.patch [ 12595371 ]
        Hide
        Bozo Dragojevic added a comment -

        This seems like a less heavy-handed approach to handling POLLHUP.

        Handle POLLHUP as pending io

        According to http://www.greenend.org.uk/rjk/tech/poll.html and
        http://lkml.indiana.edu/hypermail/linux/kernel/0404.0/0770.html
        it is better to treat POLLHUP as input event than an error.

        If a connector is in a state where no input is expected,
        like the final close frame was already received
        the POLLHUP gets treated as output event.

        Show
        Bozo Dragojevic added a comment - This seems like a less heavy-handed approach to handling POLLHUP. Handle POLLHUP as pending io According to http://www.greenend.org.uk/rjk/tech/poll.html and http://lkml.indiana.edu/hypermail/linux/kernel/0404.0/0770.html it is better to treat POLLHUP as input event than an error. If a connector is in a state where no input is expected, like the final close frame was already received the POLLHUP gets treated as output event.
        Bozo Dragojevic made changes -
        Attachment 0001-Handle-POLLHUP-as-POLLERR.patch [ 12595248 ]
        Hide
        Bozo Dragojevic added a comment -

        (gdb) p ssn
        $1 = (pn_session_t *) 0x0

        1679 // XXX: what if session is NULL?

        Show
        Bozo Dragojevic added a comment - (gdb) p ssn $1 = (pn_session_t *) 0x0 1679 // XXX: what if session is NULL?
        Bozo Dragojevic made changes -
        Field Original Value New Value
        Attachment confuse-driver.py [ 12595010 ]
        Hide
        Bozo Dragojevic added a comment -

        running against

        PN_TRACE_RAW=1 PN_TRACE_FRM=1 PN_TRACE_DRV=1 tests/tools/apps/c/msgr-recv -a amqp://~0.0.0.0:55555/rcv -R -N msgr-recv -V -w100 -W100

        running confuse-driver.py once kicks messenger into a busy loop.

        If I run it once more it usually ends up in a segfault:

        [0x100878000:1] <- @begin [remote-channel=1, next-outgoing-id=0, incoming-window=2147483647, outgoing-window=0]

        Program received signal EXC_BAD_ACCESS, Could not access memory.
        Reason: KERN_INVALID_ADDRESS at address: 0x0000000000000aa0
        0x0000000100021273 in pn_do_begin (disp=0x100878000) at /Users/bozzo/XLII/blpapi-reference/qpid-proton/proton-c/src/engine/engine.c:1684
        1684 ssn->state.incoming_transfer_count = next;
        (gdb) where
        #0 0x0000000100021273 in pn_do_begin (disp=0x100878000) at /Users/bozzo/XLII/blpapi-reference/qpid-proton/proton-c/src/engine/engine.c:1684
        #1 0x000000010001cf92 in pn_dispatch_frame (disp=0x100878000, frame=

        {type = 0 '\0', channel = 1, ex_size = 0, extended = 0x100874008 "", size = 24, payload = 0x100874008 ""}

        ) at /Users/bozzo/XLII/blpapi-reference/qpid-proton/proton-c/src/dispatcher/dispatcher.c:151
        #2 0x000000010001d2b6 in pn_dispatcher_input (disp=0x100878000, bytes=0x100874000 "", available=87) at /Users/bozzo/XLII/blpapi-reference/qpid-proton/proton-c/src/dispatcher/dispatcher.c:174
        #3 0x000000010002a77d in pn_input_read_amqp (io_layer=0x10086f9e0, bytes=0x100874000 "", available=119) at /Users/bozzo/XLII/blpapi-reference/qpid-proton/proton-c/src/engine/engine.c:2233
        #4 0x0000000100020df4 in pn_io_layer_input_passthru (io_layer=0x10086f9a8, data=0x100874000 "", available=119) at /Users/bozzo/XLII/blpapi-reference/qpid-proton/proton-c/src/engine/engine.c:3273
        #5 0x0000000100020df4 in pn_io_layer_input_passthru (io_layer=0x10086f970, data=0x100874000 "", available=119) at /Users/bozzo/XLII/blpapi-reference/qpid-proton/proton-c/src/engine/engine.c:3273
        #6 0x0000000100022a05 in transport_consume (transport=0x10086f400) at /Users/bozzo/XLII/blpapi-reference/qpid-proton/proton-c/src/engine/engine.c:2154
        #7 0x0000000100026b77 in pn_transport_push (transport=0x10086f400, size=119) at /Users/bozzo/XLII/blpapi-reference/qpid-proton/proton-c/src/engine/engine.c:3342
        #8 0x0000000100037167 in pn_connector_process (c=0x10011d500) at /Users/bozzo/XLII/blpapi-reference/qpid-proton/proton-c/src/posix/driver.c:578
        #9 0x0000000100030f66 in pn_messenger_tsync (messenger=0x100101430, predicate=0x100032d40 <pn_messenger_rcvd>, timeout=-1) at /Users/bozzo/XLII/blpapi-reference/qpid-proton/proton-c/src/messenger/messenger.c:679
        #10 0x0000000100031122 in pn_messenger_sync (messenger=0x100101430, predicate=0x100032d40 <pn_messenger_rcvd>) at /Users/bozzo/XLII/blpapi-reference/qpid-proton/proton-c/src/messenger/messenger.c:710
        #11 0x00000001000330b4 in pn_messenger_recv (messenger=0x100101430, n=-1) at /Users/bozzo/XLII/blpapi-reference/qpid-proton/proton-c/src/messenger/messenger.c:1286
        #12 0x0000000100001d23 in main (argc=9, argv=0x7fff5fbff978) at /Users/bozzo/XLII/blpapi-reference/qpid-proton/tests/tools/apps/c/msgr-recv.c:215
        Current language: auto; currently minimal
        (gdb)

        Show
        Bozo Dragojevic added a comment - running against PN_TRACE_RAW=1 PN_TRACE_FRM=1 PN_TRACE_DRV=1 tests/tools/apps/c/msgr-recv -a amqp://~0.0.0.0:55555/rcv -R -N msgr-recv -V -w100 -W100 running confuse-driver.py once kicks messenger into a busy loop. If I run it once more it usually ends up in a segfault: [0x100878000:1] <- @begin [remote-channel=1, next-outgoing-id=0, incoming-window=2147483647, outgoing-window=0] Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: KERN_INVALID_ADDRESS at address: 0x0000000000000aa0 0x0000000100021273 in pn_do_begin (disp=0x100878000) at /Users/bozzo/XLII/blpapi-reference/qpid-proton/proton-c/src/engine/engine.c:1684 1684 ssn->state.incoming_transfer_count = next; (gdb) where #0 0x0000000100021273 in pn_do_begin (disp=0x100878000) at /Users/bozzo/XLII/blpapi-reference/qpid-proton/proton-c/src/engine/engine.c:1684 #1 0x000000010001cf92 in pn_dispatch_frame (disp=0x100878000, frame= {type = 0 '\0', channel = 1, ex_size = 0, extended = 0x100874008 "", size = 24, payload = 0x100874008 ""} ) at /Users/bozzo/XLII/blpapi-reference/qpid-proton/proton-c/src/dispatcher/dispatcher.c:151 #2 0x000000010001d2b6 in pn_dispatcher_input (disp=0x100878000, bytes=0x100874000 "", available=87) at /Users/bozzo/XLII/blpapi-reference/qpid-proton/proton-c/src/dispatcher/dispatcher.c:174 #3 0x000000010002a77d in pn_input_read_amqp (io_layer=0x10086f9e0, bytes=0x100874000 "", available=119) at /Users/bozzo/XLII/blpapi-reference/qpid-proton/proton-c/src/engine/engine.c:2233 #4 0x0000000100020df4 in pn_io_layer_input_passthru (io_layer=0x10086f9a8, data=0x100874000 "", available=119) at /Users/bozzo/XLII/blpapi-reference/qpid-proton/proton-c/src/engine/engine.c:3273 #5 0x0000000100020df4 in pn_io_layer_input_passthru (io_layer=0x10086f970, data=0x100874000 "", available=119) at /Users/bozzo/XLII/blpapi-reference/qpid-proton/proton-c/src/engine/engine.c:3273 #6 0x0000000100022a05 in transport_consume (transport=0x10086f400) at /Users/bozzo/XLII/blpapi-reference/qpid-proton/proton-c/src/engine/engine.c:2154 #7 0x0000000100026b77 in pn_transport_push (transport=0x10086f400, size=119) at /Users/bozzo/XLII/blpapi-reference/qpid-proton/proton-c/src/engine/engine.c:3342 #8 0x0000000100037167 in pn_connector_process (c=0x10011d500) at /Users/bozzo/XLII/blpapi-reference/qpid-proton/proton-c/src/posix/driver.c:578 #9 0x0000000100030f66 in pn_messenger_tsync (messenger=0x100101430, predicate=0x100032d40 <pn_messenger_rcvd>, timeout=-1) at /Users/bozzo/XLII/blpapi-reference/qpid-proton/proton-c/src/messenger/messenger.c:679 #10 0x0000000100031122 in pn_messenger_sync (messenger=0x100101430, predicate=0x100032d40 <pn_messenger_rcvd>) at /Users/bozzo/XLII/blpapi-reference/qpid-proton/proton-c/src/messenger/messenger.c:710 #11 0x00000001000330b4 in pn_messenger_recv (messenger=0x100101430, n=-1) at /Users/bozzo/XLII/blpapi-reference/qpid-proton/proton-c/src/messenger/messenger.c:1286 #12 0x0000000100001d23 in main (argc=9, argv=0x7fff5fbff978) at /Users/bozzo/XLII/blpapi-reference/qpid-proton/tests/tools/apps/c/msgr-recv.c:215 Current language: auto; currently minimal (gdb)
        Bozo Dragojevic created issue -

          People

          • Assignee:
            Ted Ross
            Reporter:
            Bozo Dragojevic
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development