Log4j 2
  1. Log4j 2
  2. LOG4J2-391

Flume appender crashes when lock timeout occurs

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 2.0-beta9
    • Fix Version/s: 2.1
    • Component/s: Appenders
    • Labels:
      None
    • Environment:

      Rhel 5

      Description

      Flume appender seems to crash when it can't acquire the lock on the embeded Berkeley DB before the timeout period. The flume appender should retry when this happens. Here is the stack trace:

      2013-08-20 22:40:36,315 -0700 level=INFO class=STDOUT 2013-08-20 22:40:36,314 ERROR Error reading database com.sleepycat.je.LockTimeoutException: (JE 5.0.73) Lock expired. Locker 988835149 -1_Thread-12_ThreadLocker: waited for lock on database=FlumePersistent[pprdgblas30i.ie.intuit.net:8800,pprdgblas30c.ie.intuit.net:8800,pprdgblas30b.ie.intuit.net:8800] /usr/local/whp-jboss-ws-5/server/default/flumeData LockAddr:2123888757 LSN=0x93/0x2d3184 type=READ grant=WAIT_NEW timeoutMillis=5000 startTime=1377063631308 endTime=1377063636310
      Owners: [<LockInfo locker="1275407544 1268918_DaemonPool-1-thread-2902_Txn" type="WRITE"/>]
      Waiters: []

      at com.sleepycat.je.txn.LockManager.newLockTimeoutException(LockManager.java:664)
      at com.sleepycat.je.txn.LockManager.makeTimeoutMsgInternal(LockManager.java:623)
      at com.sleepycat.je.txn.SyncedLockManager.makeTimeoutMsg(SyncedLockManager.java:97)
      at com.sleepycat.je.txn.LockManager.lockInternal(LockManager.java:390)
      at com.sleepycat.je.txn.LockManager.lock(LockManager.java:276)
      at com.sleepycat.je.txn.BasicLocker.lockInternal(BasicLocker.java:118)
      at com.sleepycat.je.txn.Locker.lock(Locker.java:443)
      at com.sleepycat.je.dbi.CursorImpl.lockLN(CursorImpl.java:2621)
      at com.sleepycat.je.dbi.CursorImpl.lockLN(CursorImpl.java:2422)
      at com.sleepycat.je.dbi.CursorImpl.fetchCurrent(CursorImpl.java:2253)
      at com.sleepycat.je.dbi.CursorImpl.getCurrentAlreadyLatched(CursorImpl.java:1466)
      at com.sleepycat.je.dbi.CursorImpl.getNext(CursorImpl.java:1593)
      at com.sleepycat.je.Cursor.retrieveNextAllowPhantoms(Cursor.java:2924)
      at com.sleepycat.je.Cursor.retrieveNextNoDups(Cursor.java:2801)
      at com.sleepycat.je.Cursor.retrieveNext(Cursor.java:2775)
      at com.sleepycat.je.Cursor.getNext(Cursor.java:1128)
      at org.apache.logging.log4j.flume.appender.FlumePersistentManager$WriterThread.sendBatch(FlumePersistentManager.java:568)
      at org.apache.logging.log4j.flume.appender.FlumePersistentManager$WriterThread.run(FlumePersistentManager.java:473)
      2013-08-20 22:40:56,947 -0700 level=INFO class=STDOUT 2ba4851d-0a24-11e3-9b42-005056b5071b 2013-08-20 22:40:56,946 ERROR An exception occurred processing Appender flume org.apache.logging.log4j.LoggingException: Unable to record event
      at org.apache.logging.log4j.flume.appender.FlumePersistentManager.send(FlumePersistentManager.java:172)
      at org.apache.logging.log4j.flume.appender.FlumeAppender.append(FlumeAppender.java:102)
      at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:102)
      at org.apache.logging.log4j.core.appender.rewrite.RewriteAppender.append(RewriteAppender.java:92)
      at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:102)
      at org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:426)
      at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:407)
      at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:368)
      at org.apache.logging.log4j.core.Logger.log(Logger.java:110)
      at org.apache.logging.log4j.spi.AbstractLoggerWrapper.log(AbstractLoggerWrapper.java:55)
      at org.apache.logging.log4j.EventLogger.logEvent(EventLogger.java:56)
      at com.intuit.ifs.globallogging.dto.LogEventFactory$AuditProxy.invoke(LogEventFactory.java:242)
      at $Proxy161.logEvent(Unknown Source)
      at com.intuit.ifs.globallogging.fakeapp.LoggingController.startLogging(LoggingController.java:157)
      at sun.reflect.GeneratedMethodAccessor179.invoke(Unknown Source)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at org.springframework.web.bind.annotation.support.HandlerMethodInvoker.invokeHandlerMethod(HandlerMethodInvoker.java:176)
      at org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.invokeHandlerMethod(AnnotationMethodHandlerAdapter.java:436)
      at org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.handle(AnnotationMethodHandlerAdapter.java:424)
      at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:923)
      at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:852)
      at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:882)
      at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:778)
      at javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
      at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
      at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
      at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
      at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:235)
      at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
      at org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAssociationValve.java:183)
      at org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:95)
      at org.jboss.web.tomcat.security.SecurityContextEstablishmentValve.process(SecurityContextEstablishmentValve.java:126)
      at org.jboss.web.tomcat.security.SecurityContextEstablishmentValve.invoke(SecurityContextEstablishmentValve.java:70)
      at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
      at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
      at org.jboss.web.tomcat.service.jca.CachedConnectionValve.invoke(CachedConnectionValve.java:158)
      at org.apache.catalina.authenticator.SingleSignOn.invoke(SingleSignOn.java:383)
      at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
      at org.jboss.web.tomcat.service.request.ActiveRequestResponseCacheValve.internalProcess(ActiveRequestResponseCacheValve.java:74)
      at org.jboss.web.tomcat.service.request.ActiveRequestResponseCacheValve.invoke(ActiveRequestResponseCacheValve.java:47)
      at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:330)
      at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:829)
      at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:599)
      at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:451)
      at java.lang.Thread.run(Thread.java:662)
      2013-08-20 22:40:57,559 -0700 level=INFO class=STDOUT 2be29f90-0a24-11e3-9b42-005056b5071b 2013-08-20 22:40:57,557 ERROR An exception occurred processing Appender flume org.apache.logging.log4j.LoggingException: Unable to record event
      at org.apache.logging.log4j.flume.appender.FlumePersistentManager.send(FlumePersistentManager.java:172)
      at org.apache.logging.log4j.flume.appender.FlumeAppender.append(FlumeAppender.java:102)
      at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:102)
      at org.apache.logging.log4j.core.appender.rewrite.RewriteAppender.append(RewriteAppender.java:92)
      at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:102)
      at org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:426)
      at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:407)
      at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:368)
      at org.apache.logging.log4j.core.Logger.log(Logger.java:110)
      at org.apache.logging.log4j.spi.AbstractLoggerWrapper.log(AbstractLoggerWrapper.java:55)
      at org.apache.logging.log4j.EventLogger.logEvent(EventLogger.java:56)
      at com.intuit.ifs.globallogging.dto.LogEventFactory$AuditProxy.invoke(LogEventFactory.java:242)
      at $Proxy199.logEvent(Unknown Source)
      at com.intuit.ifs.globallogging.fakeapp.LoggingController.startLogging(LoggingController.java:157)
      at sun.reflect.GeneratedMethodAccessor179.invoke(Unknown Source)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at org.springframework.web.bind.annotation.support.HandlerMethodInvoker.invokeHandlerMethod(HandlerMethodInvoker.java:176)
      at org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.invokeHandlerMethod(AnnotationMethodHandlerAdapter.java:436)
      at org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.handle(AnnotationMethodHandlerAdapter.java:424)
      at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:923)
      at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:852)
      at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:882)
      at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:778)
      at javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
      at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
      at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
      at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
      at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:235)
      at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
      at org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAssociationValve.java:183)
      at org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:95)
      at org.jboss.web.tomcat.security.SecurityContextEstablishmentValve.process(SecurityContextEstablishmentValve.java:126)
      at org.jboss.web.tomcat.security.SecurityContextEstablishmentValve.invoke(SecurityContextEstablishmentValve.java:70)
      at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
      at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
      at org.jboss.web.tomcat.service.jca.CachedConnectionValve.invoke(CachedConnectionValve.java:158)
      at org.apache.catalina.authenticator.SingleSignOn.invoke(SingleSignOn.java:383)
      at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
      at org.jboss.web.tomcat.service.request.ActiveRequestResponseCacheValve.internalProcess(ActiveRequestResponseCacheValve.java:74)
      at org.jboss.web.tomcat.service.request.ActiveRequestResponseCacheValve.invoke(ActiveRequestResponseCacheValve.java:47)
      at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:330)
      at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:829)
      at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:599)
      at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:451)
      at java.lang.Thread.run(Thread.java:662)
      2013-08-20 22:40:58,641 -0700 level=INFO class=STDOUT 3060d4b8-0a24-11e3-9b42-005056b5071b 2013-08-20 22:40:58,639 ERROR An exception occurred processing Appender flume org.apache.logging.log4j.LoggingException: Unable to record event
      at org.apache.logging.log4j.flume.appender.FlumePersistentManager.send(FlumePersistentManager.java:172)
      at org.apache.logging.log4j.flume.appender.FlumeAppender.append(FlumeAppender.java:102)
      at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:102)
      at org.apache.logging.log4j.core.appender.rewrite.RewriteAppender.append(RewriteAppender.java:92)
      at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:102)
      at org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:426)
      at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:407)
      at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:368)
      at org.apache.logging.log4j.core.Logger.log(Logger.java:110)
      at org.apache.logging.log4j.spi.AbstractLoggerWrapper.log(AbstractLoggerWrapper.java:55)
      at org.apache.logging.log4j.EventLogger.logEvent(EventLogger.java:56)
      at com.intuit.ifs.globallogging.dto.LogEventFactory$AuditProxy.invoke(LogEventFactory.java:242)
      at $Proxy162.logEvent(Unknown Source)
      at com.intuit.ifs.globallogging.fakeapp.LoggingController.startLogging(LoggingController.java:157)
      at sun.reflect.GeneratedMethodAccessor179.invoke(Unknown Source)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      at java.lang.reflect.Method.invoke(Method.java:597)
      at org.springframework.web.bind.annotation.support.HandlerMethodInvoker.invokeHandlerMethod(HandlerMethodInvoker.java:176)
      at org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.invokeHandlerMethod(AnnotationMethodHandlerAdapter.java:436)
      at org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.handle(AnnotationMethodHandlerAdapter.java:424)
      at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:923)
      at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:852)
      at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:882)
      at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:778)
      at javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
      at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
      at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
      at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
      at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:235)
      at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
      at org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAssociationValve.java:183)
      at org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:95)
      at org.jboss.web.tomcat.security.SecurityContextEstablishmentValve.process(SecurityContextEstablishmentValve.java:126)
      at org.jboss.web.tomcat.security.SecurityContextEstablishmentValve.invoke(SecurityContextEstablishmentValve.java:70)
      at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
      at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
      at org.jboss.web.tomcat.service.jca.CachedConnectionValve.invoke(CachedConnectionValve.java:158)
      at org.apache.catalina.authenticator.SingleSignOn.invoke(SingleSignOn.java:383)
      at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
      at org.jboss.web.tomcat.service.request.ActiveRequestResponseCacheValve.internalProcess(ActiveRequestResponseCacheValve.java:74)
      at org.jboss.web.tomcat.service.request.ActiveRequestResponseCacheValve.invoke(ActiveRequestResponseCacheValve.java:47)
      at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:330)
      at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:829)
      at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:599)
      at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:451)
      at java.lang.Thread.run(Thread.java:662)

      1. logging-log4j2-391-1.patch
        3 kB
        Kamal Bahadur
      2. logging-log4j2-391.patch
        3 kB
        Kamal Bahadur
      3. log4j2-391_patch.txt
        19 kB
        Kamal Bahadur

        Activity

        Hide
        Kamal Bahadur added a comment -

        I am closing this issue as I am happy with the fix.

        Show
        Kamal Bahadur added a comment - I am closing this issue as I am happy with the fix.
        Hide
        Kamal Bahadur added a comment - - edited

        No problem Gary. I tested the flume appender with 3.5 millions logs and it worked fine. I did not see any time out exceptions . I am closing this issue.

        Show
        Kamal Bahadur added a comment - - edited No problem Gary. I tested the flume appender with 3.5 millions logs and it worked fine. I did not see any time out exceptions . I am closing this issue.
        Hide
        Gary Gregory added a comment - - edited

        Great Kamal, thank you for going through the extra steps

        Show
        Gary Gregory added a comment - - edited Great Kamal, thank you for going through the extra steps
        Hide
        Kamal Bahadur added a comment -

        Ok Gary, I will do that.

        Show
        Kamal Bahadur added a comment - Ok Gary, I will do that.
        Hide
        Gary Gregory added a comment -

        Kamal Bahadur: You best bet is to clone the repository from Git and do a local build. This will make testing much easier. You can install in your local Maven repo with "mvn clean install".

        Gary

        Show
        Gary Gregory added a comment - Kamal Bahadur : You best bet is to clone the repository from Git and do a local build. This will make testing much easier. You can install in your local Maven repo with "mvn clean install". Gary
        Hide
        Kamal Bahadur added a comment -

        I was going to test the changes with the snapshot version from apache but it looks like the snapshots there are old. How often does the snapshots get generated?

        I am looking at http://repository.apache.org/content/groups/snapshots/org/apache/logging/log4j/log4j-flume-ng/2.1-SNAPSHOT/

        Show
        Kamal Bahadur added a comment - I was going to test the changes with the snapshot version from apache but it looks like the snapshots there are old. How often does the snapshots get generated? I am looking at http://repository.apache.org/content/groups/snapshots/org/apache/logging/log4j/log4j-flume-ng/2.1-SNAPSHOT/
        Hide
        Kamal Bahadur added a comment -

        Thanks guys! I will run another round of performance test today.

        Show
        Kamal Bahadur added a comment - Thanks guys! I will run another round of performance test today.
        Hide
        Ralph Goers added a comment -

        Gary, thanks for the hint. I was able to apply the patch through IntelliJ. Kanal, pleaes verify and close.

        Show
        Ralph Goers added a comment - Gary, thanks for the hint. I was able to apply the patch through IntelliJ. Kanal, pleaes verify and close.
        Hide
        Ralph Goers added a comment - - edited

        Matt. I tried patch -p1 before I tried the git apply. I will try with the second file he uploaded but at a glance it looked the same as the first file. I have no idea why it didn't work for me. I'll review the patch, but I used to work with Kamal and I have no doubt he tested it.

        Show
        Ralph Goers added a comment - - edited Matt. I tried patch -p1 before I tried the git apply. I will try with the second file he uploaded but at a glance it looked the same as the first file. I have no idea why it didn't work for me. I'll review the patch, but I used to work with Kamal and I have no doubt he tested it.
        Hide
        Gary Gregory added a comment -

        The patch applies fine from Eclipse's Team>Apply patch... The unit tests pass. I've not done a code review on the patch though.

        Show
        Gary Gregory added a comment - The patch applies fine from Eclipse's Team>Apply patch... The unit tests pass. I've not done a code review on the patch though.
        Hide
        Matt Sicker added a comment -

        Patch looks like it should work with git apply. If not, use patch -p1 <file.patch

        Show
        Matt Sicker added a comment - Patch looks like it should work with git apply. If not, use patch -p1 <file.patch
        Hide
        Kamal Bahadur added a comment -

        I am not sure what is wrong. I just tried to create the patch again.

        Show
        Kamal Bahadur added a comment - I am not sure what is wrong. I just tried to create the patch again.
        Hide
        Ralph Goers added a comment -

        I tried doing
        git apply ../logging-log4j2-391.patch

        and got

        ../logging-log4j2-391.patch:19: space before tab in indent.
        final BatchEvent batch = new BatchEvent();
        ../logging-log4j2-391.patch:20: space before tab in indent.
        for (int retryIndex = 0; retryIndex < lockTimeoutRetryCount; ++retryIndex) {
        ../logging-log4j2-391.patch:21: space before tab in indent.
        try {
        ../logging-log4j2-391.patch:22: space before tab in indent.
        cursor = database.openCursor(null, CursorConfig.DEFAULT);
        ../logging-log4j2-391.patch:24: trailing whitespace.

        fatal: corrupt patch at line 63

        Show
        Ralph Goers added a comment - I tried doing git apply ../logging-log4j2-391.patch and got ../logging-log4j2-391.patch:19: space before tab in indent. final BatchEvent batch = new BatchEvent(); ../logging-log4j2-391.patch:20: space before tab in indent. for (int retryIndex = 0; retryIndex < lockTimeoutRetryCount; ++retryIndex) { ../logging-log4j2-391.patch:21: space before tab in indent. try { ../logging-log4j2-391.patch:22: space before tab in indent. cursor = database.openCursor(null, CursorConfig.DEFAULT); ../logging-log4j2-391.patch:24: trailing whitespace. fatal: corrupt patch at line 63
        Hide
        Kamal Bahadur added a comment - - edited

        I synced the latest code from git before I made the changes. Let me double check

        I just compared the changes with head revision and it looks ok to me.

        Show
        Kamal Bahadur added a comment - - edited I synced the latest code from git before I made the changes. Let me double check I just compared the changes with head revision and it looks ok to me.
        Hide
        Ralph Goers added a comment -

        Are you sure the patch file is good? I am having trouble applying it - but it might be me.

        Show
        Ralph Goers added a comment - Are you sure the patch file is good? I am having trouble applying it - but it might be me.
        Hide
        Kamal Bahadur added a comment -

        Attached is the patch to handle time out exception during batch send.

        Show
        Kamal Bahadur added a comment - Attached is the patch to handle time out exception during batch send.
        Hide
        Kamal Bahadur added a comment -

        LockTimeoutException has not been handled for batch. In sendBatch method of FlumePersistentManager.WriterThread, the methods cursor.getFirst and cursor.getNext are not catching LockConflictException. We should catch the LockConflictException and include the retry logic just like we implemented in send method.

        Show
        Kamal Bahadur added a comment - LockTimeoutException has not been handled for batch. In sendBatch method of FlumePersistentManager.WriterThread, the methods cursor.getFirst and cursor.getNext are not catching LockConflictException. We should catch the LockConflictException and include the retry logic just like we implemented in send method.
        Hide
        Ralph Goers added a comment -

        I am marking this as resolved. Please close it when you have verified the current release.

        Show
        Ralph Goers added a comment - I am marking this as resolved. Please close it when you have verified the current release.
        Hide
        Steve Sulatycki added a comment -

        Ralph,

        I pieced through this issue over the weekend and indeed the log4j flume jar was the culprit - it was beta7.

        Thought I haven't actually verified the lock timeout fix, please feel free to close this issue out. We'll deploy the code soon and I'll update is indeed the issue persists.

        Thanks and sorry for the confusion on this.

        Steve

        Show
        Steve Sulatycki added a comment - Ralph, I pieced through this issue over the weekend and indeed the log4j flume jar was the culprit - it was beta7. Thought I haven't actually verified the lock timeout fix, please feel free to close this issue out. We'll deploy the code soon and I'll update is indeed the issue persists. Thanks and sorry for the confusion on this. Steve
        Hide
        Ralph Goers added a comment - - edited

        I am having a hard time looking into this. The stack trace above is definitely not for beta9 - nothing lines up. I understand you believe you verified checksums but somehow you are not running beta9.

        I notice you don't have the log4j flume jar in the list of MD5's you checked.

        Show
        Ralph Goers added a comment - - edited I am having a hard time looking into this. The stack trace above is definitely not for beta9 - nothing lines up. I understand you believe you verified checksums but somehow you are not running beta9. I notice you don't have the log4j flume jar in the list of MD5's you checked.
        Hide
        Steve Sulatycki added a comment - - edited

        Kamal,

        I was hoping that could be the case, so I downloaded beta9, and compared that to what we have in our repo by running md5 checksums:

        Fresh beta9 download:

        MD5 (log4j-1.2-api-2.0-beta9.jar) = 8d44e2635d2efebeed217b068190912b
        MD5 (log4j-api-2.0-beta9.jar) = 243049406066e313938e168335069d32
        MD5 (log4j-core-2.0-beta9.jar) = 152ecb3ce094ac5bc9ea39d6122e2814
        MD5 (log4j-jcl-2.0-beta9.jar) = 364b72e29a557e58fdd709e665fdaf8d
        MD5 (log4j-slf4j-impl-2.0-beta9.jar) = 0df94ba36ac62c9ace4c5216cae8c87d
        MD5 (log4j-taglib-2.0-beta9.jar) = f7c919ea26d83878d5773048c8e0f7b1

        In local repo (and on servers throwing exceptions...):

        MD5 (log4j-1.2-api-2.0.jar) = 8d44e2635d2efebeed217b068190912b
        MD5 (log4j-api-2.0.jar) = 243049406066e313938e168335069d32
        MD5 (log4j-core-2.0.jar) = 152ecb3ce094ac5bc9ea39d6122e2814
        MD5 (log4j-jcl-2.0.jar) = 364b72e29a557e58fdd709e665fdaf8d
        MD5 (log4j-slf4j-impl-2.0.jar) = 0df94ba36ac62c9ace4c5216cae8c87d
        MD5 (log4j-taglib-2.0.jar) = f7c919ea26d83878d5773048c8e0f7b1

        All those checksums appear to match.

        Here's what our Flume appender config looks like:

        <Flume
        name="flume"
        type="persistent"
        compress="true"
        suppressExceptions="true"
        batchSize="10"
        maxDelay="15050"
        connectTimeout="5000"
        requestTimeout="10000"
        dataDir="/logs/flume">
        <Agent host="agent01" port="44444"/>
        <Agent host="agent02" port="44444"/>
        <PatternLayout pattern="%d

        {yyyyMMddHHmmssSSS}

        |1|%m%throwable

        {none}

        " />
        </Flume>

        Show
        Steve Sulatycki added a comment - - edited Kamal, I was hoping that could be the case, so I downloaded beta9, and compared that to what we have in our repo by running md5 checksums: Fresh beta9 download: MD5 (log4j-1.2-api-2.0-beta9.jar) = 8d44e2635d2efebeed217b068190912b MD5 (log4j-api-2.0-beta9.jar) = 243049406066e313938e168335069d32 MD5 (log4j-core-2.0-beta9.jar) = 152ecb3ce094ac5bc9ea39d6122e2814 MD5 (log4j-jcl-2.0-beta9.jar) = 364b72e29a557e58fdd709e665fdaf8d MD5 (log4j-slf4j-impl-2.0-beta9.jar) = 0df94ba36ac62c9ace4c5216cae8c87d MD5 (log4j-taglib-2.0-beta9.jar) = f7c919ea26d83878d5773048c8e0f7b1 In local repo (and on servers throwing exceptions...): MD5 (log4j-1.2-api-2.0.jar) = 8d44e2635d2efebeed217b068190912b MD5 (log4j-api-2.0.jar) = 243049406066e313938e168335069d32 MD5 (log4j-core-2.0.jar) = 152ecb3ce094ac5bc9ea39d6122e2814 MD5 (log4j-jcl-2.0.jar) = 364b72e29a557e58fdd709e665fdaf8d MD5 (log4j-slf4j-impl-2.0.jar) = 0df94ba36ac62c9ace4c5216cae8c87d MD5 (log4j-taglib-2.0.jar) = f7c919ea26d83878d5773048c8e0f7b1 All those checksums appear to match. Here's what our Flume appender config looks like: <Flume name="flume" type="persistent" compress="true" suppressExceptions="true" batchSize="10" maxDelay="15050" connectTimeout="5000" requestTimeout="10000" dataDir="/logs/flume"> <Agent host="agent01" port="44444"/> <Agent host="agent02" port="44444"/> <PatternLayout pattern="%d {yyyyMMddHHmmssSSS} |1|%m%throwable {none} " /> </Flume>
        Hide
        Kamal Bahadur added a comment -

        Stack trace matches with the source in beta8 instead of beta9. Steve, can you double check the version you are on?

        Kamal

        Show
        Kamal Bahadur added a comment - Stack trace matches with the source in beta8 instead of beta9. Steve, can you double check the version you are on? Kamal
        Hide
        Steve Sulatycki added a comment -

        Though this has been closed, this issue still persists in beta9. We have multiple servers/services that have had this error, which effectively stops the Flume logging.

        Version: beta9

        Config: Running async loggers, with the following parameters:
        -DLog4jContextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector
        -DAsyncLogger.RingBufferSize=512
        -DAsyncLogger.WaitStrategy=Block

        OSes:
        1. SunOS 5.10 Generic_141445-09 i86pc i386 i86pc
        2. Linux 2.6.32-220.el6.x86_64 #1 SMP Tue Dec 6 19:48:22 GMT 2011 x86_64 x86_64 x86_64 GNU/Linux

        Error seen in logs:

        2014-02-04 16:00:19,945 ERROR Error reading database com.sleepycat.je.LockTimeoutException: (JE 5.0.73) Lock expired. Locker 31404296 -1_Thread-3_ThreadLocker: waited for lock on database=FlumePersistent[agent01:44444,agent02:44444] /logs/flume LockAddr:793535 LSN=0x5f/0x7004b8 type=READ grant=WAIT_NEW timeoutMillis=5000 startTime=1391529614658 endTime=1391529619941
        Owners: [<LockInfo locker="28110478 856908_DaemonPool-1-thread-36_Txn" type="WRITE"/>]
        Waiters: []

        at com.sleepycat.je.txn.LockManager.newLockTimeoutException(LockManager.java:664)
        at com.sleepycat.je.txn.LockManager.makeTimeoutMsgInternal(LockManager.java:623)
        at com.sleepycat.je.txn.SyncedLockManager.makeTimeoutMsg(SyncedLockManager.java:97)
        at com.sleepycat.je.txn.LockManager.lockInternal(LockManager.java:390)
        at com.sleepycat.je.txn.LockManager.lock(LockManager.java:276)
        at com.sleepycat.je.txn.BasicLocker.lockInternal(BasicLocker.java:118)
        at com.sleepycat.je.txn.Locker.lock(Locker.java:443)
        at com.sleepycat.je.dbi.CursorImpl.lockLN(CursorImpl.java:2621)
        at com.sleepycat.je.dbi.CursorImpl.lockLN(CursorImpl.java:2422)
        at com.sleepycat.je.dbi.CursorImpl.fetchCurrent(CursorImpl.java:2253)
        at com.sleepycat.je.dbi.CursorImpl.getCurrentAlreadyLatched(CursorImpl.java:1466)
        at com.sleepycat.je.dbi.CursorImpl.getNext(CursorImpl.java:1593)
        at com.sleepycat.je.Cursor.retrieveNextAllowPhantoms(Cursor.java:2924)
        at com.sleepycat.je.Cursor.retrieveNextNoDups(Cursor.java:2801)
        at com.sleepycat.je.Cursor.retrieveNext(Cursor.java:2775)
        at com.sleepycat.je.Cursor.getNext(Cursor.java:1128)
        at org.apache.logging.log4j.flume.appender.FlumePersistentManager$WriterThread.run(FlumePersistentManager.java:460)

        2014-02-04 16:00:21,591 ERROR An exception occurred processing Appender flume org.apache.logging.log4j.LoggingException: Unable to record event
        at org.apache.logging.log4j.flume.appender.FlumePersistentManager.send(FlumePersistentManager.java:166)
        at org.apache.logging.log4j.flume.appender.FlumeAppender.append(FlumeAppender.java:102)
        at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:102)
        at org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:426)
        at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:407)
        at org.apache.logging.log4j.core.Logger$PrivateConfig.logEvent(Logger.java:273)
        at org.apache.logging.log4j.core.async.AsyncLogger.actualAsyncLog(AsyncLogger.java:240)
        at org.apache.logging.log4j.core.async.RingBufferLogEvent.execute(RingBufferLogEvent.java:97)
        at org.apache.logging.log4j.core.async.RingBufferLogEventHandler.onEvent(RingBufferLogEventHandler.java:43)
        at org.apache.logging.log4j.core.async.RingBufferLogEventHandler.onEvent(RingBufferLogEventHandler.java:28)
        at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:133)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)

        I would suggest that the retry and timeout parameters be made configurable so that they may be tuned for specific needs.

        Show
        Steve Sulatycki added a comment - Though this has been closed, this issue still persists in beta9. We have multiple servers/services that have had this error, which effectively stops the Flume logging. Version: beta9 Config: Running async loggers, with the following parameters: -DLog4jContextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector -DAsyncLogger.RingBufferSize=512 -DAsyncLogger.WaitStrategy=Block OSes: 1. SunOS 5.10 Generic_141445-09 i86pc i386 i86pc 2. Linux 2.6.32-220.el6.x86_64 #1 SMP Tue Dec 6 19:48:22 GMT 2011 x86_64 x86_64 x86_64 GNU/Linux Error seen in logs: 2014-02-04 16:00:19,945 ERROR Error reading database com.sleepycat.je.LockTimeoutException: (JE 5.0.73) Lock expired. Locker 31404296 -1_Thread-3_ThreadLocker: waited for lock on database=FlumePersistent [agent01:44444,agent02:44444] /logs/flume LockAddr:793535 LSN=0x5f/0x7004b8 type=READ grant=WAIT_NEW timeoutMillis=5000 startTime=1391529614658 endTime=1391529619941 Owners: [<LockInfo locker="28110478 856908_DaemonPool-1-thread-36_Txn" type="WRITE"/>] Waiters: [] at com.sleepycat.je.txn.LockManager.newLockTimeoutException(LockManager.java:664) at com.sleepycat.je.txn.LockManager.makeTimeoutMsgInternal(LockManager.java:623) at com.sleepycat.je.txn.SyncedLockManager.makeTimeoutMsg(SyncedLockManager.java:97) at com.sleepycat.je.txn.LockManager.lockInternal(LockManager.java:390) at com.sleepycat.je.txn.LockManager.lock(LockManager.java:276) at com.sleepycat.je.txn.BasicLocker.lockInternal(BasicLocker.java:118) at com.sleepycat.je.txn.Locker.lock(Locker.java:443) at com.sleepycat.je.dbi.CursorImpl.lockLN(CursorImpl.java:2621) at com.sleepycat.je.dbi.CursorImpl.lockLN(CursorImpl.java:2422) at com.sleepycat.je.dbi.CursorImpl.fetchCurrent(CursorImpl.java:2253) at com.sleepycat.je.dbi.CursorImpl.getCurrentAlreadyLatched(CursorImpl.java:1466) at com.sleepycat.je.dbi.CursorImpl.getNext(CursorImpl.java:1593) at com.sleepycat.je.Cursor.retrieveNextAllowPhantoms(Cursor.java:2924) at com.sleepycat.je.Cursor.retrieveNextNoDups(Cursor.java:2801) at com.sleepycat.je.Cursor.retrieveNext(Cursor.java:2775) at com.sleepycat.je.Cursor.getNext(Cursor.java:1128) at org.apache.logging.log4j.flume.appender.FlumePersistentManager$WriterThread.run(FlumePersistentManager.java:460) 2014-02-04 16:00:21,591 ERROR An exception occurred processing Appender flume org.apache.logging.log4j.LoggingException: Unable to record event at org.apache.logging.log4j.flume.appender.FlumePersistentManager.send(FlumePersistentManager.java:166) at org.apache.logging.log4j.flume.appender.FlumeAppender.append(FlumeAppender.java:102) at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:102) at org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:426) at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:407) at org.apache.logging.log4j.core.Logger$PrivateConfig.logEvent(Logger.java:273) at org.apache.logging.log4j.core.async.AsyncLogger.actualAsyncLog(AsyncLogger.java:240) at org.apache.logging.log4j.core.async.RingBufferLogEvent.execute(RingBufferLogEvent.java:97) at org.apache.logging.log4j.core.async.RingBufferLogEventHandler.onEvent(RingBufferLogEventHandler.java:43) at org.apache.logging.log4j.core.async.RingBufferLogEventHandler.onEvent(RingBufferLogEventHandler.java:28) at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:133) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) I would suggest that the retry and timeout parameters be made configurable so that they may be tuned for specific needs.
        Hide
        Ralph Goers added a comment -

        I applied a modified version of the patch in revision 1521640. Please verify and close.

        Show
        Ralph Goers added a comment - I applied a modified version of the patch in revision 1521640. Please verify and close.
        Hide
        Kamal Bahadur added a comment -

        Made the number of retires configurable

        Show
        Kamal Bahadur added a comment - Made the number of retires configurable
        Hide
        Kamal Bahadur added a comment -

        I'll make it configurable

        Show
        Kamal Bahadur added a comment - I'll make it configurable
        Hide
        Gary Gregory added a comment -

        I'll let Ralph opine on the patch but I would expect these retry values to be configurable from log4j.xml.

        Show
        Gary Gregory added a comment - I'll let Ralph opine on the patch but I would expect these retry values to be configurable from log4j.xml.
        Hide
        Kamal Bahadur added a comment -

        Attached is a patch to fix this bug. Basically, I have added a logic to sleep for 500 milliseconds and retry if lock related exception occurs. It will retry 100 times before setting the errors to true. This should prevent the FlumePersistentManager to shutdown.

        Show
        Kamal Bahadur added a comment - Attached is a patch to fix this bug. Basically, I have added a logic to sleep for 500 milliseconds and retry if lock related exception occurs. It will retry 100 times before setting the errors to true. This should prevent the FlumePersistentManager to shutdown.
        Hide
        Kamal Bahadur added a comment -
        Show
        Kamal Bahadur added a comment - This could be related: https://forums.oracle.com/thread/2426617

          People

          • Assignee:
            Unassigned
            Reporter:
            Kamal Bahadur
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development