Created attachment 27584 [details] Sample dump with deadlock Have see occasional deadlocks when stopping a test plan with multiple HC4 test samplers. The cause of the deadlock may be due to HC4 - see https://issues.apache.org/jira/browse/HTTPCLIENT-1127 - but it's not yet clear why JMeter appears to be running threadFinished when interrupt is still active. See attached dump.
Created attachment 27587 [details] Thread Dump Of Issue
Created attachment 27588 [details] Log File of test that generates issue
Created attachment 27589 [details] Test Case that produced Deadlock Hello, I reproduced by One run the deadlock. Just add a mirror server on 8081 and run test case. When number of Running Threads reaches 40, call stop, deadlock occurs. Regards Philippe Mouawad
Created attachment 27590 [details] Log in DEBUG mode Zip that contains log file in DEBUG mode
Hello, For me the issue is an HTTPClient one. It is not a problem that appears to be running threadFinished when interrupt is still active. The explanation for this would be this kind of sequence: 1) Stopper thread (Thread-77 in my Thread Dump) first calls StandardJMeterEngine.tellThreadsToStop which calls stop() => Thread is suspended here 2) "Thread Group 1-32" gets stop call , run thread is stopped and we enter threadFinished => Thread is suspended 3) Stopper thread enters interrupt call and calls abort which is suspended just before AbstractClientConnAdapter.abortConnection 4) "Thread Group 1-32" locks org.apache.http.impl.conn.SingleClientConnManager through shutdown call Deadlock occurs. Regards Philippe Mouawad
Created attachment 27591 [details] Fix to the issue Hello, Here is my fix that should be removed once bug is fixed in HTTPCLient. To workaround the pb related to a different sequence of calls on shutdown from HTTPHC4Impl, I synchronize both request.abort(); and the for loop that shutdowns the connection managers of a thread. It should not have an important performance impact since it only concerns stop and shutdown sequences. Since I applied it I never reproduce issue. Regards Philippe Mouawad
Hi, I don't think the bug is only related to HTTPClient as it appears also (every times) with the DummySampler from our plugins (http://code.google.com/p/jmeter-plugins/). This sampler is basic and defined as "public class DummySampler extends AbstractSampler". The problem was introduced with 2.5.1, it was working fine with 2.4 and 2.5. Maybe this info can help to investigate the problem... Stef
(In reply to comment #6) > Created attachment 27591 [details] > Fix to the issue > > Hello, > Here is my fix that should be removed once bug is fixed in HTTPCLient. > > To workaround the pb related to a different sequence of calls on shutdown from > HTTPHC4Impl, I synchronize both request.abort(); and the for loop that > shutdowns the connection managers of a thread. > > It should not have an important performance impact since it only concerns stop > and shutdown sequences. > > Since I applied it I never reproduce issue. That's a neat solution to the HC4 deadlock issue; thanks.
(In reply to comment #7) > Hi, > > I don't think the bug is only related to HTTPClient as it appears also (every > times) with the DummySampler from our plugins > (http://code.google.com/p/jmeter-plugins/). This sampler is basic and defined > as "public class DummySampler extends AbstractSampler". > > The problem was introduced with 2.5.1, it was working fine with 2.4 and 2.5. > > Maybe this info can help to investigate the problem... What would really help would be the thread dump and jmeter log file for a deadlock.
What appears to be the issue is that the interrupt may(In reply to comment #5) > Hello, > For me the issue is an HTTPClient one. > It is not a problem that appears to be running threadFinished when interrupt is > still active. Actually, I think this is a general problem; however so far only HC4 seems to be affected by it. > The explanation for this would be this kind of sequence: > 1) Stopper thread (Thread-77 in my Thread Dump) first calls > StandardJMeterEngine.tellThreadsToStop which calls stop() => Thread is > suspended here > 2) "Thread Group 1-32" gets stop call , run thread is stopped and we enter > threadFinished => Thread is suspended > 3) Stopper thread enters interrupt call and calls abort which is suspended just > before AbstractClientConnAdapter.abortConnection > 4) "Thread Group 1-32" locks org.apache.http.impl.conn.SingleClientConnManager > through shutdown call The explanation I have is slightly different; threads are S (engine) and T (JMeterThread); numbers are time sequence: T1) active sample in progress S2) interrupt active sample T3) sample completes S4) [interrupt cleanup in progress] T5) thread completes S6) [interrupt cleanup still in progress] T7) shutdown invoked I tried adding a short pause between S6 and T7 and that fixed the problem for me, as it allowed the interrupt to complete. The fix in comment 6 works because it causes the JMeterThread code to wait until the engine interrupt has finished. I think the fix actually needs to be applied to JMeterThread instead so it applies to all samplers, not just HC4.
Hello Sebb, I think issue (Deadlock) only occurs with HC4 implementation because in this one we use for interrupt request#abort while in HC3 we use the same method used in shutdown. My fix was just for deadlock, by locking a HttpHC4Impl level, I ensure sequence that request.abort and ConnectionManager().shutdown() are never called at the same time which would cause the deadlock. I am not sure to understand the general issue you are talking about: - Is it that interrupt might be called after shutdown ? Regards Philippe Mouawad
(In reply to comment #11) > I am not sure to understand the general issue you are talking about: > - Is it that interrupt might be called after shutdown ? No, the issue is that shutdown might be called before interrupt has finished. But I don't think it's sensible to allow interrupt to be called after shutdown either.
Created attachment 27599 [details] Fix to JMeterThread Hello Sebb, Ok I agree. Here is my proposition of a patch to ensure interrupt goes to the end before shutdown gets called. I tested by removing my previous patch and it works. Regards Philippe Mouawad
(In reply to comment #13) > Created attachment 27599 [details] ... > Here is my proposition of a patch to ensure interrupt goes to the end before > shutdown gets called. Thanks, but it's interrupt() that needs to be protected. > I tested by removing my previous patch and it works. Yes, that will work because stop() happens to be called before interrupt() in this case. However, interrupt() is a public method and could be called from elsewhere without a preceeding stop() call. No need to redo the patch, I'm testing one now.
Patched JMeterThread: URL: http://svn.apache.org/viewvc?rev=1176077&view=rev Log: Bug 51888 - Occasional deadlock when stopping a testplan Modified: jakarta/jmeter/trunk/src/core/org/apache/jmeter/threads/JMeterThread.java jakarta/jmeter/trunk/xdocs/changes.xml == It would be helpful to have a DummySampler test case that fails on 2.5.1 RC2. I've been unable to generate deadlocks with DummySampler using either 2.5.1 RC2 or the current revision r1176077 (available from nightly builds dir)
(In reply to comment #15) > Patched JMeterThread: > > URL: http://svn.apache.org/viewvc?rev=1176077&view=rev > Log: > Bug 51888 - Occasional deadlock when stopping a testplan > > Modified: > jakarta/jmeter/trunk/src/core/org/apache/jmeter/threads/JMeterThread.java > jakarta/jmeter/trunk/xdocs/changes.xml > > == > > It would be helpful to have a DummySampler test case that fails on 2.5.1 RC2. > > I've been unable to generate deadlocks with DummySampler using either 2.5.1 RC2 > or the current revision r1176077 (available from nightly builds dir) Hi, Sorry I was wrong the problem with the DummySampler is not a deadlock, but rather related to Bug 51880 which I reopened and submitted logs/dumps. I have no deadlock, just threads waiting end of rampup phase before shuting down. Stef
Hello Sebb, I tested the fix and for me it introduces an issue: - I get the popup saying : "One or more test threads won't exit; see log file." To reproduce issue , I run with 3 threads on 15 s rampup. I let 2 thread start and call stop at that time, you will see that the stopper thread may stay alive. Furthermore I think there is something wrong in the fix: - number of lock/unlock don't match, is this regular ? - what forbidds stop from taking the lock before interrupt ? for me the latch did it because it was incremented in stop so run thread would be stopped until interrupt released the latch (I agree making that on public method may no be the best) that interrupt would countdown. But here if stop goes faster it will take the lock and interrupt will just wait, or maybe I have missed something. Regards Philippe Mouawad
(In reply to comment #17) > Hello Sebb, > I tested the fix and for me it introduces an issue: > - I get the popup saying : "One or more test threads won't exit; see log file." > > > To reproduce issue , I run with 3 threads on 15 s rampup. > I let 2 thread start and call stop at that time, you will see that the stopper > thread may stay alive. I don't get the problem. Are you running against the mirror, or a live site? Can you attach a thread dump and jmeter log (INFO will do)? > > Furthermore I think there is something wrong in the fix: > - number of lock/unlock don't match, is this regular ? The shutdown code takes the lock but does not release it. As the thread then exits, it should not matter, but we could add an unlock at the end. > - what forbidds stop from taking the lock before interrupt ? for me the latch > did it because it was incremented in stop so run thread would be stopped until > interrupt released the latch (I agree making that on public method may no be > the best) that interrupt would countdown. But here if stop goes faster it will > take the lock and interrupt will just wait, or maybe I have missed something. stop does not take the lock; interrupt does. It rechecks the currentSampler within the lock to ensure it cannot run after shutdown starts - if interrupt thread ran slowly it could see the value was non-null but the main thread could start shutdown before the interrupt lock started. The double-check prevents interrupt running after shutdown.
Created attachment 27604 [details] 2 Thread Dumps Hello, First thread dump was done while interrupt was still going on after I called stop. Then after some time threads ended but as you can see interrupt still waiting on lock. Regards Philippe
I test with TestPlan and mirror server on 8081.
(In reply to comment #19) > Created attachment 27604 [details] > 2 Thread Dumps > > Hello, > First thread dump was done while interrupt was still going on after I called > stop. > Then after some time threads ended but as you can see interrupt still waiting > on lock. I see; I'd not considered that. What happens if you call unlock at the end of the JMeterThread shutdown section? i.e. at the end of the finally block around lines 308-316. Does that allow interrupt to finish? > Regards > Philippe
Hello Sebb, I have finally the scenario that shows that fix does not make stop wait for interrupt. To reproduce it: 1) point to a web facing site (not mirror because Eclipse will go crazy) and configure 3 threads 2) Start JMeter in Debug mode 3) Put a breakpoint at line 310 and 585 4) Start test 5) When number of threads reaches 2, call stop 6) Eclipse will stop you at both 310 and 585 and depending on your choice interrupt will run before stop (OK) or stop will run before interrupt (NOT OK) I will attach log file, thread dump. Regards Philippe Mouawad
Created attachment 27605 [details] Thread dump showing interrupt holds lock for end of life
Created attachment 27606 [details] Log file
Added fix to release the lock: URL: http://svn.apache.org/viewvc?rev=1176291&view=rev Log: Release the interrupt lock to allow late interrupts to complete Modified: jakarta/jmeter/trunk/src/core/org/apache/jmeter/threads/JMeterThread.java Can you try the test again with this version? (It's in nightly builds).
This time no lock but shutdown of HttpClient Conn Manager occurs before interrupt so I get: 2011/09/27 11:22:06 WARN - jmeter.threads.JMeterThread: Caught Exception interrupting sampler: java.lang.NullPointerException To reproduce, do same scenario as described and wait 3 sec after thread 2 has started before calling stop. Regards Philippe
(In reply to comment #26) > This time no lock but shutdown of HttpClient Conn Manager occurs before > interrupt so I get: > 2011/09/27 11:22:06 WARN - jmeter.threads.JMeterThread: Caught Exception > interrupting sampler: java.lang.NullPointerException What is the actual NPE? It may be in the console. > > To reproduce, do same scenario as described and wait 3 sec after thread 2 has > started before calling stop. > Regards > Philippe
I added exception as parameter to logger.warn: 2011/09/27 12:26:10 WARN - jmeter.threads.JMeterThread: Caught Exception interrupting sampler: java.lang.NullPointerException java.lang.NullPointerException at org.apache.jmeter.threads.JMeterThread.interrupt(JMeterThread.java:602) at org.apache.jmeter.engine.StandardJMeterEngine.tellThreadsToStop(StandardJMeterEngine.java:552) at org.apache.jmeter.engine.StandardJMeterEngine.access$2(StandardJMeterEngine.java:547) at org.apache.jmeter.engine.StandardJMeterEngine$StopTest.run(StandardJMeterEngine.java:284) at java.lang.Thread.run(Thread.java:680) It's currentSampler that has been nullified by run finally block while interrupt is executing. Regards Philippe
(In reply to comment #28) > I added exception as parameter to logger.warn: > 2011/09/27 12:26:10 WARN - jmeter.threads.JMeterThread: Caught Exception > interrupting sampler: java.lang.NullPointerException > java.lang.NullPointerException > at org.apache.jmeter.threads.JMeterThread.interrupt(JMeterThread.java:602) > at > org.apache.jmeter.engine.StandardJMeterEngine.tellThreadsToStop(StandardJMeterEngine.java:552) > at > org.apache.jmeter.engine.StandardJMeterEngine.access$2(StandardJMeterEngine.java:547) > at > org.apache.jmeter.engine.StandardJMeterEngine$StopTest.run(StandardJMeterEngine.java:284) > at java.lang.Thread.run(Thread.java:680) > > > It's currentSampler that has been nullified by run finally block while > interrupt is executing. The code fetches the the value again, but fails to check it for null. That's a mistake; there should be a null check, which is used to skip the code when shutdown runs before interrup. No wonder there's an NPE... Committed another fix; hopefully that's enough now: URL: http://svn.apache.org/viewvc?rev=1176320&view=rev Log: Forgot to check if currentSampler had been set to null, e.g. by shutdown Modified: jakarta/jmeter/trunk/src/core/org/apache/jmeter/threads/JMeterThread.java
Works now, great, thanks. Although Sebb, we can still have shutdown called before interrupt, I thought you thought it was an issue. Regards Philippe
(In reply to comment #30) > Works now, great, thanks. > > Although Sebb, we can still have shutdown called before interrupt, I thought > you thought it was an issue. The thread sets currentSampler to null before running shutdown. So if interrupt runs afterwards, it will do nothing. Or do you see a problem with that?
It means shutdown of Conn Manager can occur while a sample is running (if server takes some time to respond). This will give an error like this I think: 2011/09/27 13:50:15 INFO - org.apache.jmeter.protocol.http.sampler.HTTPHC4Impl$3: I/O exception (java.net.SocketException) caught when connecting to the target host: Software caused connection abort 2011/09/27 13:50:15 INFO - org.apache.jmeter.protocol.http.sampler.HTTPHC4Impl$3: Retrying connect To reproduce put a Breakpoint in org.apache.http.impl.client.DefaultRequestDirector line 393 just before stopping test, then let run() method end, add breakpoint line 323 of JMeterThread and let JMeterThread end then let DefaultRequestDirector go and you will get Error% > 0. So it impacts a little the error % but I think it is acceptable. Philippe
(In reply to comment #32) > It means shutdown of Conn Manager can occur while a sample is running (if > server takes some time to respond). > This will give an error like this I think: > 2011/09/27 13:50:15 INFO - > org.apache.jmeter.protocol.http.sampler.HTTPHC4Impl$3: I/O exception > (java.net.SocketException) caught when connecting to the target host: Software > caused connection abort > 2011/09/27 13:50:15 INFO - > org.apache.jmeter.protocol.http.sampler.HTTPHC4Impl$3: Retrying connect I think that's a bug in HC4 - it should not retry the connect once the request has been aborted, which is what I think is happening here. > > To reproduce put a Breakpoint in > org.apache.http.impl.client.DefaultRequestDirector line 393 just before > stopping test, then let run() method end, add breakpoint line 323 of > JMeterThread and let JMeterThread end then let DefaultRequestDirector go and > you will get Error% > 0. > > So it impacts a little the error % but I think it is acceptable. Yes, I'm pretty sure that the error will go away when the HC4 bug is fixed. It should be permissible to call abort() then shutdown() without generating a connect error message from HC4. The hack I added fixes the problem for request retry; I'm hoping I might be able to extend the fix to connect retries.
Created attachment 27609 [details] Patch to put unlock in a finally Hello Sebb, Then it is OK for me. Just a little note, I would put unlock inside a finally to ensure it is executed. Attached is the little patch for this. From my side, you can close issue as resolved. Regards Philippe Mouawad
(In reply to comment #34) > Created attachment 27609 [details] > Patch to put unlock in a finally > > Hello Sebb, > Then it is OK for me. > Just a little note, I would put unlock inside a finally to ensure it is > executed. > Attached is the little patch for this. URL: http://svn.apache.org/viewvc?rev=1176398&view=rev Log: Ensure unlock is called Modified: jakarta/jmeter/trunk/src/core/org/apache/jmeter/threads/JMeterThread.java
(In reply to comment #33) > > The hack I added fixes the problem for request retry; I'm hoping I might be > able to extend the fix to connect retries. URL: http://svn.apache.org/viewvc?rev=1176404&view=rev Log: More workrounds for HC4 bug Modified: jakarta/jmeter/trunk/src/protocol/http/org/apache/jmeter/protocol/http/sampler/HTTPHC4Impl.java
I hope this is now finally resolved.
This issue has been migrated to GitHub: https://github.com/apache/jmeter/issues/2564