Details
Description
Hi Folks,
If I set a receive block to 1 message and in turn try to send to a destination from multiple messengers on different threads, it seems to cause a deadlock in send / recv. I have attached a small example which recreates this:
This is the output we expect this application to produce:
Starting qpidListenerThread...
Waiting on data to come into pn_messenger_recv...
Data received by pn_messenger_recv...
Message received with subject: 'MESSAGE FROM MAIN THREAD'
Moving back to pn_messenger_recv
Waiting on data to come into pn_messenger_recv...
Starting qpidSenderThread...
Finished with qpidSenderThread...
Data received by pn_messenger_recv...
Message received with subject: 'MESSAGE FROM PTHREAD'
Moving back to pn_messenger_recv
Waiting on data to come into pn_messenger_recv...
This is what actually gets produced (note the second message is never received)
Starting qpidListenerThread...
Waiting on data to come into pn_messenger_recv...
Data received by pn_messenger_recv...
Message received with subject: 'MESSAGE FROM MAIN THREAD'
Moving back to pn_messenger_recv
Waiting on data to come into pn_messenger_recv...
Starting qpidSenderThread...
Which deadlocks with the following backtrace:
(gdb) thread apply all bt
Thread 3 (Thread 0xb77c9b70 (LWP 9431)):
#0 0x00cc8424 in __kernel_vsyscall ()
#1 0x0021cca6 in poll () from /lib/libc.so.6
#2 0x00c0f9fa in pn_driver_wait_2 ()
from /home/fquinn/lib/qpid-proton-0.5/lib/libqpid-proton.so.2
#3 0x00c0fd9f in pn_driver_wait ()
from /home/fquinn/lib/qpid-proton-0.5/lib/libqpid-proton.so.2
#4 0x00c0a4d1 in pn_messenger_tsync ()
from /home/fquinn/lib/qpid-proton-0.5/lib/libqpid-proton.so.2
#5 0x00c0a7bc in pn_messenger_sync ()
from /home/fquinn/lib/qpid-proton-0.5/lib/libqpid-proton.so.2
#6 0x00c0c27a in pn_messenger_recv ()
from /home/fquinn/lib/qpid-proton-0.5/lib/libqpid-proton.so.2
#7 0x08048953 in qpidListenerThread ()
#8 0x00355a49 in start_thread () from /lib/libpthread.so.0
#9 0x00227aee in clone () from /lib/libc.so.6
Thread 2 (Thread 0xb6dc8b70 (LWP 9432)):
#0 0x00cc8424 in __kernel_vsyscall ()
#1 0x0021cca6 in poll () from /lib/libc.so.6
#2 0x00c0f9fa in pn_driver_wait_2 ()
from /home/fquinn/lib/qpid-proton-0.5/lib/libqpid-proton.so.2
#3 0x00c0fd9f in pn_driver_wait ()
from /home/fquinn/lib/qpid-proton-0.5/lib/libqpid-proton.so.2
#4 0x00c0a4d1 in pn_messenger_tsync ()
from /home/fquinn/lib/qpid-proton-0.5/lib/libqpid-proton.so.2
#5 0x00c0a7bc in pn_messenger_sync ()
from /home/fquinn/lib/qpid-proton-0.5/lib/libqpid-proton.so.2
#6 0x00c0c1d5 in pn_messenger_send ()
from /home/fquinn/lib/qpid-proton-0.5/lib/libqpid-proton.so.2
#7 0x08048a5d in qpidSenderThread ()
#8 0x00355a49 in start_thread () from /lib/libpthread.so.0
#9 0x00227aee in clone () from /lib/libc.so.6
Thread 1 (Thread 0xb77ca990 (LWP 9430)):
#0 0x00cc8424 in __kernel_vsyscall ()
#1 0x0035610d in pthread_join () from /lib/libpthread.so.0
#2 0x08048bc9 in main ()
Note that we know that this can be avoided by using the same messenger across different threads for publishing or by setting a larger receive window, but we expected this to work regardless and our existing implementation depends on it.
Cheers,
Frank