ServiceMix
  1. ServiceMix
  2. SM-512

sendsync from a service to another service seems to cause a deadlock under load

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: 3.0-M2
    • Fix Version/s: 3.0
    • Component/s: None
    • Labels:
      None
    • Environment:

      Windows 2003, Intel 2.8 xeon processor. Java 1.5 3.0-M2

      Description

      We have 2 services A and B. A makes sync requests to B. We have a JMS client that feeds requests to A and that triggers a sync request to B. We are trying to push some 5000 requests (not sequential). Now
      With st flow - A gets all the responses from B and we are good
      with Seda flow - A gets some responses from B (sometimes none) and then all the threads are blocked (seen using JMX) on syncsend(), looks like B has no threads to service requests.
      with JMS flow ( A and B on different VMs), we see the similar behavior as with seda flow

      With async requests everything works just fine. We have tried increasing seda capacity and workmanager thread count, that did not help. I am attaching service A and B code along with configuration files

      Also it would be nice to have all the tunable features documented with some explaination somewhere ( I could not find it anywhere)

        Activity

        Hide
        Guillaume Nodet added a comment -

        I' ve been thinking about that a bit.
        The main problem is that as soon as components use sendSync, this can lead to deadlocks, unless
        you have an unbounded number of threads.

        Let' s take the worst possible example. We have a component that implements the factorial of an integer using
        recursive sendSync. The number of threads needed for this very component is unbounded. Of course, I hope that
        nobody will ever implement it that way.

        So, in your case, you have a jms queue with thousands of pending messages, but I think this is the same problem.
        The jms component will use its thread pool to process incoming jms messages. If you have more jms messages
        than threads, each thread of the pool will be used to process one jms message. If the jms component use
        sendSync, and if the exchange ever need to come back to the jms component, there will be a deadlock.

        I really don't see any solutions here, other than using asynchonous send, or an unbounded thread pool.
        The workaround is to use throttling, as defined in the jmx mbeans for the component.

        We need a way to easily configure all these parameters (queue capacity for seda / delivery channel, thread pool, throttling).

        Show
        Guillaume Nodet added a comment - I' ve been thinking about that a bit. The main problem is that as soon as components use sendSync, this can lead to deadlocks, unless you have an unbounded number of threads. Let' s take the worst possible example. We have a component that implements the factorial of an integer using recursive sendSync. The number of threads needed for this very component is unbounded. Of course, I hope that nobody will ever implement it that way. So, in your case, you have a jms queue with thousands of pending messages, but I think this is the same problem. The jms component will use its thread pool to process incoming jms messages. If you have more jms messages than threads, each thread of the pool will be used to process one jms message. If the jms component use sendSync, and if the exchange ever need to come back to the jms component, there will be a deadlock. I really don't see any solutions here, other than using asynchonous send, or an unbounded thread pool. The workaround is to use throttling, as defined in the jmx mbeans for the component. We need a way to easily configure all these parameters (queue capacity for seda / delivery channel, thread pool, throttling).
        Hide
        Guillaume Nodet added a comment -

        I'm closing this issue as there is no real bug we can fix here.
        I've raised SM-521 to ease tuning configuration

        Show
        Guillaume Nodet added a comment - I'm closing this issue as there is no real bug we can fix here. I've raised SM-521 to ease tuning configuration
        Hide
        Krzysztof Sobolewski added a comment -

        I don't really agree there is no real bug here, because it violates the principle of least surprise. I tripped on the same problem and spent about two days of investigation before I found the cause and reached this report. I think that a deadlock resulting from a compliant use is a bug no matter what.

        But I agree that it might be tricky to fix. How about, for example, setting the exchage state to SYNC_STATE_SYNC_SENT before returning if the same thread is executing both sides of the conversation (let's call it super-synchronous)?

        I'm basing it on the observation that in org.apache.servicemix.jbi.messaging.DeliveryChannelImpl.sendSync(MessageExchange, long) there is:
        if (me.getSyncState() != MessageExchangeImpl.SYNC_STATE_SYNC_RECEIVED)

        { waitForExchange(me, me, timeout, "sendSync"); }

        else {
        and the call to waitForExchange() calls me.wait() which waits for itself (and thus deadlocks).

        Show
        Krzysztof Sobolewski added a comment - I don't really agree there is no real bug here, because it violates the principle of least surprise. I tripped on the same problem and spent about two days of investigation before I found the cause and reached this report. I think that a deadlock resulting from a compliant use is a bug no matter what. But I agree that it might be tricky to fix. How about, for example, setting the exchage state to SYNC_STATE_SYNC_SENT before returning if the same thread is executing both sides of the conversation (let's call it super-synchronous)? I'm basing it on the observation that in org.apache.servicemix.jbi.messaging.DeliveryChannelImpl.sendSync(MessageExchange, long) there is: if (me.getSyncState() != MessageExchangeImpl.SYNC_STATE_SYNC_RECEIVED) { waitForExchange(me, me, timeout, "sendSync"); } else { and the call to waitForExchange() calls me.wait() which waits for itself (and thus deadlocks).
        Hide
        Guillaume Nodet added a comment -

        I think it's even a bit more tricky.
        There are two sides in the message exchange, each side being used by one component. So when there is a wait called from waitForExchange(), the other component will unlock it with a call to notifyExchange(exchange.getMirror()).
        So it works, as long as there are two threads involved, which should always be the case. However, I'm wondering if there could be some cases under high load, where a single thread is used (in the super-synchronous case) and this would be problematic, because the jbi semantic could not be achieved: if you use an InOut, both components could use sendSync (but for the DONE status), and it should work. In this case, the second sendSync should be blocked until the DONE status is processed.
        Where do you suggest to add the SYNC_STATE_SYNC_SENT stuff ?

        Show
        Guillaume Nodet added a comment - I think it's even a bit more tricky. There are two sides in the message exchange, each side being used by one component. So when there is a wait called from waitForExchange(), the other component will unlock it with a call to notifyExchange(exchange.getMirror()). So it works, as long as there are two threads involved, which should always be the case. However, I'm wondering if there could be some cases under high load, where a single thread is used (in the super-synchronous case) and this would be problematic, because the jbi semantic could not be achieved: if you use an InOut, both components could use sendSync (but for the DONE status), and it should work. In this case, the second sendSync should be blocked until the DONE status is processed. Where do you suggest to add the SYNC_STATE_SYNC_SENT stuff ?
        Hide
        Guillaume Nodet added a comment -

        I've run a small test, and this use case it not really handled well. So I'd agree there is a bug.
        I'm not sure what's the way to solve this yet, but the problem is that i end up with the following stack trace:

        main@2, priority=5, in group 'main', status: 'RUNNING'
        at org.apache.servicemix.jbi.nmr.flow.st.STFlow.doSend(STFlow.java:50)
        at org.apache.servicemix.jbi.nmr.flow.AbstractFlow.send(AbstractFlow.java:123)
        at org.apache.servicemix.jbi.nmr.DefaultBroker.sendExchangePacket(DefaultBroker.java:283)
        at org.apache.servicemix.jbi.container.JBIContainer.sendExchange(JBIContainer.java:843)
        at org.apache.servicemix.jbi.messaging.DeliveryChannelImpl.doSend(DeliveryChannelImpl.java:395)
        at org.apache.servicemix.jbi.messaging.DeliveryChannelImpl.sendSync(DeliveryChannelImpl.java:470)
        at org.apache.servicemix.jbi.messaging.DeliveryChannelImpl.sendSync(DeliveryChannelImpl.java:442)
        at org.apache.servicemix.components.util.PojoSupport.sendSync(PojoSupport.java:208)
        at org.apache.servicemix.jbi.messaging.MEPExchangeTest$TestComponent.onMessageExchange(MEPExchangeTest.java:71)
        at org.apache.servicemix.jbi.messaging.DeliveryChannelImpl.processInBound(DeliveryChannelImpl.java:610)
        at org.apache.servicemix.jbi.nmr.flow.AbstractFlow.doRouting(AbstractFlow.java:172)
        at org.apache.servicemix.jbi.nmr.flow.st.STFlow.doSend(STFlow.java:49)
        at org.apache.servicemix.jbi.nmr.flow.AbstractFlow.send(AbstractFlow.java:123)
        at org.apache.servicemix.jbi.nmr.DefaultBroker.sendExchangePacket(DefaultBroker.java:283)
        at org.apache.servicemix.jbi.container.JBIContainer.sendExchange(JBIContainer.java:843)
        at org.apache.servicemix.jbi.messaging.DeliveryChannelImpl.doSend(DeliveryChannelImpl.java:395)
        at org.apache.servicemix.jbi.messaging.DeliveryChannelImpl.sendSync(DeliveryChannelImpl.java:470)
        at org.apache.servicemix.jbi.messaging.MEPExchangeTest.testInOutSyncSync(MEPExchangeTest.java:367)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-1)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at junit.framework.TestCase.runTest(TestCase.java:154)
        at junit.framework.TestCase.runBare(TestCase.java:127)
        at junit.framework.TestResult$1.protect(TestResult.java:106)
        at junit.framework.TestResult.runProtected(TestResult.java:124)
        at junit.framework.TestResult.run(TestResult.java:109)
        at junit.framework.TestCase.run(TestCase.java:118)
        at junit.textui.TestRunner.doRun(TestRunner.java:116)
        at com.intellij.rt.execution.junit.IdeaTestRunner.doRun(IdeaTestRunner.java:65)
        at junit.textui.TestRunner.doRun(TestRunner.java:109)
        at com.intellij.rt.execution.junit.IdeaTestRunner.startRunnerWithArgs(IdeaTestRunner.java:24)
        at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:118)
        at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:40)

        At the top of the thread, the processInbound method is called, but the checks are wrong and instead of deliverying the exchange, the lock at the bottom of the stack trace is notified, but not the one from the second sendSync. It's not even a deadlock, it's more than the right lock is never notified.
        So yeah, there is a missing SYNC_STATE_SYNC_RECEIVED i suppose, but also, the DONE status has to be delivered somehow.
        The test I used it to use the ST flow with a provider component implementing MessageExchangeListener and using sendSync on both sides: this leads to the super-synchronous use case ...

        Show
        Guillaume Nodet added a comment - I've run a small test, and this use case it not really handled well. So I'd agree there is a bug. I'm not sure what's the way to solve this yet, but the problem is that i end up with the following stack trace: main@2, priority=5, in group 'main', status: 'RUNNING' at org.apache.servicemix.jbi.nmr.flow.st.STFlow.doSend(STFlow.java:50) at org.apache.servicemix.jbi.nmr.flow.AbstractFlow.send(AbstractFlow.java:123) at org.apache.servicemix.jbi.nmr.DefaultBroker.sendExchangePacket(DefaultBroker.java:283) at org.apache.servicemix.jbi.container.JBIContainer.sendExchange(JBIContainer.java:843) at org.apache.servicemix.jbi.messaging.DeliveryChannelImpl.doSend(DeliveryChannelImpl.java:395) at org.apache.servicemix.jbi.messaging.DeliveryChannelImpl.sendSync(DeliveryChannelImpl.java:470) at org.apache.servicemix.jbi.messaging.DeliveryChannelImpl.sendSync(DeliveryChannelImpl.java:442) at org.apache.servicemix.components.util.PojoSupport.sendSync(PojoSupport.java:208) at org.apache.servicemix.jbi.messaging.MEPExchangeTest$TestComponent.onMessageExchange(MEPExchangeTest.java:71) at org.apache.servicemix.jbi.messaging.DeliveryChannelImpl.processInBound(DeliveryChannelImpl.java:610) at org.apache.servicemix.jbi.nmr.flow.AbstractFlow.doRouting(AbstractFlow.java:172) at org.apache.servicemix.jbi.nmr.flow.st.STFlow.doSend(STFlow.java:49) at org.apache.servicemix.jbi.nmr.flow.AbstractFlow.send(AbstractFlow.java:123) at org.apache.servicemix.jbi.nmr.DefaultBroker.sendExchangePacket(DefaultBroker.java:283) at org.apache.servicemix.jbi.container.JBIContainer.sendExchange(JBIContainer.java:843) at org.apache.servicemix.jbi.messaging.DeliveryChannelImpl.doSend(DeliveryChannelImpl.java:395) at org.apache.servicemix.jbi.messaging.DeliveryChannelImpl.sendSync(DeliveryChannelImpl.java:470) at org.apache.servicemix.jbi.messaging.MEPExchangeTest.testInOutSyncSync(MEPExchangeTest.java:367) at sun.reflect.NativeMethodAccessorImpl.invoke0(NativeMethodAccessorImpl.java:-1) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at junit.framework.TestCase.runTest(TestCase.java:154) at junit.framework.TestCase.runBare(TestCase.java:127) at junit.framework.TestResult$1.protect(TestResult.java:106) at junit.framework.TestResult.runProtected(TestResult.java:124) at junit.framework.TestResult.run(TestResult.java:109) at junit.framework.TestCase.run(TestCase.java:118) at junit.textui.TestRunner.doRun(TestRunner.java:116) at com.intellij.rt.execution.junit.IdeaTestRunner.doRun(IdeaTestRunner.java:65) at junit.textui.TestRunner.doRun(TestRunner.java:109) at com.intellij.rt.execution.junit.IdeaTestRunner.startRunnerWithArgs(IdeaTestRunner.java:24) at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:118) at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:40) At the top of the thread, the processInbound method is called, but the checks are wrong and instead of deliverying the exchange, the lock at the bottom of the stack trace is notified, but not the one from the second sendSync. It's not even a deadlock, it's more than the right lock is never notified. So yeah, there is a missing SYNC_STATE_SYNC_RECEIVED i suppose, but also, the DONE status has to be delivered somehow. The test I used it to use the ST flow with a provider component implementing MessageExchangeListener and using sendSync on both sides: this leads to the super-synchronous use case ...
        Hide
        Guillaume Nodet added a comment -

        Well, there is no way to solve this problem, as the second sendSync will only be unlocked when the consumer sends back the done status, which will not happen because the first sendSync is locked until the second sendSync is processed. However, I guess we could throw a meaningful exception instead of entering a deadlock.

        Show
        Guillaume Nodet added a comment - Well, there is no way to solve this problem, as the second sendSync will only be unlocked when the consumer sends back the done status, which will not happen because the first sendSync is locked until the second sendSync is processed. However, I guess we could throw a meaningful exception instead of entering a deadlock.
        Hide
        Guillaume Nodet added a comment -

        The exception could be sent if we can find a way to detect if both components are sharing the same thread.

        Show
        Guillaume Nodet added a comment - The exception could be sent if we can find a way to detect if both components are sharing the same thread.
        Hide
        Krzysztof Sobolewski added a comment -

        > Where do you suggest to add the SYNC_STATE_SYNC_SENT stuff ?

        I'm not so familiar with the architecture, I've just dug in and found what I think is the problem
        The exception is better than a deadlock (everything is) but I don't think it's the best solution. But it can be good enough, if there's no other reliable way and especially if it's documented properly...

        Show
        Krzysztof Sobolewski added a comment - > Where do you suggest to add the SYNC_STATE_SYNC_SENT stuff ? I'm not so familiar with the architecture, I've just dug in and found what I think is the problem The exception is better than a deadlock (everything is) but I don't think it's the best solution. But it can be good enough, if there's no other reliable way and especially if it's documented properly...
        Hide
        Krzysztof Sobolewski added a comment -

        And, BTW, it seems to me rather to be a special case of deadlock where a thread waits for itself. Notifying locks in this case is of no use, because notify works by waking up a thread that is already waiting, but you can't call notify from such a thread. Notifying before waiting has no effect. Then, I think, it simply should not wait

        And, also:
        > I'm wondering if there could be some cases under high load, where a single thread is used

        Yes, it is possible. Quoting http://servicemix.apache.org/thread-pools.html:

        • if the number of threads is less than corePoolSize, the executor will create a new thread to handle the job
        • if the number of queued jobs is less than queueSize, the job is queued
        • if the queue is full and the number of threads is less than maximumPoolSize, a new thread is created to handle the job
        • else, the current thread will handle the job
          It's the last point that is problematic.
        Show
        Krzysztof Sobolewski added a comment - And, BTW, it seems to me rather to be a special case of deadlock where a thread waits for itself. Notifying locks in this case is of no use, because notify works by waking up a thread that is already waiting, but you can't call notify from such a thread. Notifying before waiting has no effect. Then, I think, it simply should not wait And, also: > I'm wondering if there could be some cases under high load, where a single thread is used Yes, it is possible. Quoting http://servicemix.apache.org/thread-pools.html: if the number of threads is less than corePoolSize, the executor will create a new thread to handle the job if the number of queued jobs is less than queueSize, the job is queued if the queue is full and the number of threads is less than maximumPoolSize, a new thread is created to handle the job else, the current thread will handle the job It's the last point that is problematic.
        Hide
        Guillaume Nodet added a comment -

        Yeah, i know the problem comes from here, but the other solutions are to reject the work (which would mean the exchange is lost), or to have an unbounded thread pool (which may be problematic too). The real solution is to get rid of the sendSync as much as possible, which should be possible in smx4, because the transaction model is changed and the use of sendSync is not required anymore to convey transactions.

        Show
        Guillaume Nodet added a comment - Yeah, i know the problem comes from here, but the other solutions are to reject the work (which would mean the exchange is lost), or to have an unbounded thread pool (which may be problematic too). The real solution is to get rid of the sendSync as much as possible, which should be possible in smx4, because the transaction model is changed and the use of sendSync is not required anymore to convey transactions.

          People

          • Assignee:
            Guillaume Nodet
            Reporter:
            anand somani
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development