Uploaded image for project: 'Qpid Proton'
  1. Qpid Proton
  2. PROTON-639

pn_messenger_recv hangs / spins on connection refused

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: proton-0.7, proton-0.8
    • Fix Version/s: None
    • Component/s: proton-c
    • Labels:
    • Environment:

      Description

      If I try to connect to a closed port with a messenger, pn_messenger_recv outputs messages to stderr and then spins at high CPU usage, rather than returning with an error as expected.

      This seems to be impacted by kernel version. I have a RHEL 6.5 machine which demonstrates this problem reliably when using kernel 2.6.32-431.1.2.el6.x86_64 and not when using 3.10.28-1.el6.elrepo.x86_64 .

      This can be easily reproduced using the "recv" example in the qpid-proton sources.

      kernel 2.6.32 - broken
      $ build/examples/messenger/c/recv amqp://127.0.0.1:1
      recv: Connection refused
      [0x63d8e0]:ERROR amqp:connection:framing-error SASL header mismatch: ''
      CONNECTION ERROR connection aborted (remote)
      # hangs at this point with high CPU usage
      

      Compare with the behavior on a later kernel version, which seems right:

      kernel 3.10.28 - expected behavior
      $ build/examples/messenger/c/recv amqp://127.0.0.1:1
      recv: Connection refused
      [0x15af8e0]:ERROR amqp:connection:framing-error SASL header mismatch: ''
      CONNECTION ERROR connection aborted (remote)
      send: Broken pipe
      /home/rmcgover/src/qpid-proton/examples/messenger/c/recv.c:132: no valid sources
      # exits with exit code 1
      

      Here's a sample backtrace when the hang is occurring:

      (gdb) bt
      #0  0x00007ffff7ffea11 in clock_gettime ()
      #1  0x0000003a51e03e46 in clock_gettime () from /lib64/librt.so.1
      #2  0x00007ffff7de6b5e in pn_i_now () from /home/rmcgover/src/qpid-proton/build/proton-c/libqpid-proton.so.2
      #3  0x00007ffff7de4c06 in pn_selector_select () from /home/rmcgover/src/qpid-proton/build/proton-c/libqpid-proton.so.2
      #4  0x00007ffff7ddf736 in pni_wait () from /home/rmcgover/src/qpid-proton/build/proton-c/libqpid-proton.so.2
      #5  0x00007ffff7ddf869 in pn_messenger_tsync () from /home/rmcgover/src/qpid-proton/build/proton-c/libqpid-proton.so.2
      #6  0x00007ffff7ddf8df in pn_messenger_sync () from /home/rmcgover/src/qpid-proton/build/proton-c/libqpid-proton.so.2
      #7  0x00007ffff7de1676 in pn_messenger_recv () from /home/rmcgover/src/qpid-proton/build/proton-c/libqpid-proton.so.2
      #8  0x00000000004014b2 in main ()
      

      There's a while(true) loop in pn_messenger_tsync which seems like it never escapes. strace also shows that the process is repeatedly doing a poll.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              rmcgover Rohan McGovern
            • Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: