Summary: | Monitor deadlock in AsyncAppender | ||
---|---|---|---|
Product: | Log4j - Now in Jira | Reporter: | Boris Folgmann <boris> |
Component: | Appender | Assignee: | log4j-dev <log4j-dev> |
Status: | RESOLVED FIXED | ||
Severity: | major | CC: | beckermanz, hallorant, jan.kronquist, kay.abendroth, schulz |
Priority: | P2 | ||
Version: | 1.2 | ||
Target Milestone: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Attachments: | Hotfix for solving the problem now. |
Description
Boris Folgmann
2006-01-05 15:04:09 UTC
I found a second situation where the buggy check bf.length() == 0 leads to a deadlock: Very often it happens that AsyncAppender.close() hangs like this: "TP-Processor251" daemon prio=1 tid=0x699f7ed8 nid=0x39b7 in Object.wait() [0x70202000..0x70203558] at java.lang.Object.wait(Native Method) - waiting on <0x4b4f6208> (a org.apache.log4j.Dispatcher) at java.lang.Thread.join(Thread.java:1095) - locked <0x4b4f6208> (a org.apache.log4j.Dispatcher) at java.lang.Thread.join(Thread.java:1148) at org.apache.log4j.AsyncAppender.close(AsyncAppender.java:144) and "Dispatcher-Thread-134" daemon prio=1 tid=0x08ebbaf8 nid=0x1bde in Object.wait() [0x6bc9c000..0x6bc9c658] at java.lang.Object.wait(Native Method) - waiting on <0x4d420e68> (a org.apache.log4j.helpers.BoundedFIFO) at java.lang.Object.wait(Object.java:474) at org.apache.log4j.Dispatcher.run(AsyncAppender.java:312) - locked <0x4d420e68> (a org.apache.log4j.helpers.BoundedFIFO) The reason is, that Dispatcher.close() DID NOT notify Dispatcher.run(), so Thread.join() in AsyncAppender.close() waits forever! Please fix that now, since AsyncAppender is totally unusable with that many concurrency bugs. Created attachment 17468 [details]
Hotfix for solving the problem now.
The patched version runs perfectly on our test and production servers.
Since there was no reaction :-(, we had to fix the problem for ourselves. I strongly urge everybody who uses the AsyncAppender to apply our patch to avoid deadlock/starvation problems. There might be possibilities to avoid some notify() calls, but it's always better to make no assumptions if a different thread is currently wait()ing or not. Even when you programmed both threads, this technique is more fault tolerant. The patch appears to be in some binary form. Can you provide a patch created by using diff? Thanks It's only gzipped, seems that the suffix (.gz) was lost. I would appreciate if you could check if new AsyncAppender implementation (see bug 38982) resolves the issues observed. See first patch file for instructions on modifying a log4j 1.2 build to use the new AsyncAppender. Any critique or comments would be appreciated, particularly if you would recommend incorporating the new AsyncAppender in a log4j 1.2 bug fix release or developing a more incremental modification. *** Bug 26224 has been marked as a duplicate of this bug. *** *** Bug 28006 has been marked as a duplicate of this bug. *** Still waiting for feedback if rewrite of log4j's AsyncAppender resolved your issue. The SVN for the log4j 1.2 branch now contains the new AsyncAppender, so you no longer have to back port the appender from log4j 1.3. Will eventually close this bug as resolved if there is no feedback. Curt, we have also experienced this exact problem in our production system. We are running a multi-clustered J2EE application. I would recommend increasing the priority on this Bug to P1 as this deadlock brings down our production system. We are implementing Boris' hotfix and will post the results as soon as they are available. If the hotfix is valid, I would also recommend a new release of log4j immediately to address the issue. We are currently using log4j 1.2.8. Thanks, Zev. *** Bug 37904 has been marked as a duplicate of this bug. *** (In reply to comment #10) > [..] If the hotfix is valid, I would also recommend a new > release of log4j immediately to address the issue. We are currently using log4j > 1.2.8. > Thanks, > Zev. Zev, does version 1.2.14 solve the issue for you? We have modified our development and system test environments to use the 1.2.14 version of log4j and we have not had any problems. We will deploy this new version to production in about 6-8 weeks. If we have issues, i will post them here. however, to date, we have not had any problems with the new version. Zev. If Boris has no issues with 1.2.14 and Zev doesn't re-reply to this bug within the next 12 weeks, this bug can be closed around February. No problem reported with 1.2.14 AsyncAppender, closing issue. (In reply to comment #6) I am seeing this bug in 1.2.14 version of log4j. We are running standalone java process. Here is the new thread dump: "Thread xxxx" prio=1 tid=0x0000002b3f8b4810 nid=0x6b91 in Object.wait() [0x0000000044270000..0x0000000044270c30] at java.lang.Object.wait(Native Method) - waiting on <0x0000002ade1969b0> (a java.util.ArrayList) at java.lang.Object.wait(Object.java:474) at org.apache.log4j.AsyncAppender.append(AsyncAppender.java:193) - locked < 0x0000002ade1969b0> (a java.util.ArrayList) at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:230) - locked < 0x0000002ade1dbd30> (a org.apache.log4j.AsyncAppender) at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders (AppenderAttachableImpl.java:65) at org.apache.log4j.Category.callAppenders(Category.java:203) - locked < 0x0000002ade21cf90> (a org.apache.log4j.spi.RootLogger) at org.apache.log4j.Category.forcedLog(Category.java:388) at org.apache.log4j.Category.log(Category.java:853) at org.apache.commons.logging.impl.Log4JLogger.debug(Log4JLogger.java:110) at org.apache.axis.encoding.DeserializationContextImpl.endElement (DeserializationContextImpl.java:1034) at org.apache.axis.message.SAX2EventRecorder.replay (SAX2EventRecorder.java:204) at org.apache.axis.message.MessageElement.publishToHandler (MessageElement.java:722) at org.apache.axis.message.RPCElement.deserialize(RPCElement.java:233) at org.apache.axis.message.RPCElement.getParams(RPCElement.java:347) at org.apache.axis.client.Call.invoke(Call.java:2272) at org.apache.axis.client.Call.invoke(Call.java:2171) at org.apache.axis.client.Call.invoke(Call.java:1691) at com.ml.marginhub.fipb.MHBridgeSoapBindingStub.getHairCut(Unknown Source) at java.lang.Thread.run(Thread.java:595) Comment #16 if it is a bug is a different bug and should be filed as a new bug. Waiting on line 193 is expected when blocking is set to true and the buffer is full. Waiting there doesn't by itself doesn't indicate a bug. Please retry using log4j 1.2.15 (only some slight changes since 1.2.14, but maybe enough) and if the problem still occurs please describe the situation more fully, such as any deadlocks on other threads, the nature of the wrapped appender, any status on the dispatch thread, and diagnostic messages on the console, etc. Please report if the situation is repeatable and if it occurs frequently, occasionally, etc. |