Log4j 2
  1. Log4j 2
  2. LOG4J2-279

Logging from log4j2 FlumeAppender with BerkeleyDB agent from Jetty webapp to Avro source with full queue raises ClosedByInterruptException

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.0-beta7
    • Fix Version/s: 2.0-beta8
    • Component/s: Flume Appender
    • Labels:
      None

      Description

      Attempting to embed a Flume agent in another app does not work very well. I have found a repro of a very simple Jetty app using the log4j2 FlumeAppender to connect to a subsequent Flume agent with a full channel.

      The impact is that I don't believe the BerkeleyDB agent can be safely used.

      Steps:
      1. Setup an additional Flume server (the subsequent server) with an avro source and make the channel fill up (in my environment the subsequent server gets an OutOfMemoryError and then starts queueing events.)
      2. Extract the enclosed project. Edit the flume-embedded-hot-deploy/src/main/resource/log4j2.xml and configure the Agent for the FlumeAppender with the details of the subsequent server.
      3. mvn clean install
      4. Change to flume-embedded-hot-deploy
      5. mvn clean package -P debug (note that you can set it to suspend until a debugger is attached with mvn clean package -P debug,suspend)
      6. Wait for Jetty to startup - and then for a few seconds.

      Expected results:
      Some complaints about the subsequent server being full but an otherwise happy server.

      Actual results:
      When using the log4j2 Persistent agent (which uses Berkeley DB as a store):
      2013-06-03 14:01:14,804 INFO [main] server.AbstractConnector (AbstractConnector.java:265) - Started ServerConnector@75a213c0

      {HTTP/1.1} {0.0.0.0:8080}

      2013-06-03 14:01:22,779 DEBUG [Thread-3] ipc.NettyTransceiver (NettyTransceiver.java:314) - Disconnecting from collector1-sal-flex-van.dev-globalrelay.net/10.21.30.20:36892
      2013-06-03 14:01:22,789 ERROR An exception occurred processing Appender FlumeAppender org.apache.logging.log4j.LoggingException: Exception occurred writing log event
      at org.apache.logging.log4j.flume.appender.FlumePersistentManager.send(FlumePersistentManager.java:176)
      at org.apache.logging.log4j.flume.appender.FlumeAppender.append(FlumeAppender.java:86)
      at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:102)
      at org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:424)
      at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:405)
      at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:366)
      at org.apache.logging.log4j.core.Logger.log(Logger.java:110)
      at org.apache.logging.log4j.spi.AbstractLoggerWrapper.log(AbstractLoggerWrapper.java:55)
      at org.slf4j.impl.SLF4JLogger.debug(SLF4JLogger.java:139)
      at org.apache.avro.ipc.NettyTransceiver$NettyClientAvroHandler.handleUpstream(NettyTransceiver.java:491)
      at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
      at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:792)
      at org.jboss.netty.handler.codec.frame.FrameDecoder.cleanup(FrameDecoder.java:348)
      at org.jboss.netty.handler.codec.frame.FrameDecoder.channelDisconnected(FrameDecoder.java:230)
      at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:107)
      at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
      at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
      at org.jboss.netty.channel.Channels.fireChannelDisconnected(Channels.java:399)
      at org.jboss.netty.channel.Channels$4.run(Channels.java:389)
      at org.jboss.netty.channel.socket.ChannelRunnableWrapper.run(ChannelRunnableWrapper.java:41)
      at org.jboss.netty.channel.socket.nio.AbstractNioWorker.processEventQueue(AbstractNioWorker.java:352)
      at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:236)
      at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:38)
      at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      at java.lang.Thread.run(Thread.java:722)
      Caused by: com.sleepycat.je.ThreadInterruptedException: (JE 5.0.73) Environment must be closed, caused by: com.sleepycat.je.ThreadInterruptedException: Environment invalid because of previous exception: (JE 5.0.73) /var/local/flume/castellan-reader-berkeley-db Channel closed, may be due to thread interrupt THREAD_INTERRUPTED: InterruptedException may cause incorrect internal state, unable to continue. Environment is invalid and must be closed.
      at com.sleepycat.je.ThreadInterruptedException.wrapSelf(ThreadInterruptedException.java:99)
      at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1512)
      at com.sleepycat.je.Transaction.checkEnv(Transaction.java:850)
      at com.sleepycat.je.Transaction.abort(Transaction.java:204)
      at org.apache.logging.log4j.flume.appender.FlumePersistentManager.send(FlumePersistentManager.java:171)
      ... 26 more
      Caused by: com.sleepycat.je.ThreadInterruptedException: Environment invalid because of previous exception: (JE 5.0.73) /var/local/flume/castellan-reader-berkeley-db Channel closed, may be due to thread interrupt THREAD_INTERRUPTED: InterruptedException may cause incorrect internal state, unable to continue. Environment is invalid and must be closed.
      at com.sleepycat.je.log.FileManager$LogEndFileDescriptor.force(FileManager.java:3054)
      at com.sleepycat.je.log.FileManager$LogEndFileDescriptor.access$500(FileManager.java:2710)
      at com.sleepycat.je.log.FileManager.syncLogEnd(FileManager.java:2022)
      at com.sleepycat.je.log.FSyncManager.executeFSync(FSyncManager.java:282)
      at com.sleepycat.je.log.FSyncManager.fsync(FSyncManager.java:233)
      at com.sleepycat.je.log.FileManager.groupSync(FileManager.java:2070)
      at com.sleepycat.je.log.LogManager.multiLog(LogManager.java:403)
      at com.sleepycat.je.log.LogManager.log(LogManager.java:335)
      at com.sleepycat.je.txn.Txn.logCommitEntry(Txn.java:957)
      at com.sleepycat.je.txn.Txn.commit(Txn.java:719)
      at com.sleepycat.je.txn.Txn.commit(Txn.java:584)
      at com.sleepycat.je.Transaction.commit(Transaction.java:317)
      at org.apache.logging.log4j.flume.appender.FlumePersistentManager.send(FlumePersistentManager.java:167)
      ... 26 more
      Caused by: java.nio.channels.ClosedByInterruptException
      at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
      at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:367)
      at com.sleepycat.je.log.FileManager$LogEndFileDescriptor.force(FileManager.java:3043)
      ... 38 more

      Suppositions:
      I believe this is an issue with the handling of a failed put or take. The failure path interacts with rollback or the Avro client in bad ways.
      For example, the Avro client uses a SynchronousQueue to do rendezvous. That queue uses InterruptedException internally. However, the FileChannel uses the NIO FileChannel (and AbstractInterruptibleChannel) which fails if its thread gets interrupted. My debugging sessions show that java.util.concurrent.CountDownLatch is a common source of InterruptedExceptions. CountDownLatch is also used by the Avro client.

      Some docs:
      http://jira.codehaus.org/browse/JETTY-80
      http://www.oracle.com/technetwork/products/berkeleydb/if-097768.html

        Issue Links

          Activity

          Hide
          Ralph Goers added a comment -

          Resolved in revision 1499780. Note that there were two issues here:
          1. Interrupts were breaking Berkeley DB.
          2. Logging of Avro and Flume to the Flume Appender causes and endless loop of logging events. As a consequence the Flume Appender now filters out all org.apache.flume and org.apache.avro events.

          Show
          Ralph Goers added a comment - Resolved in revision 1499780. Note that there were two issues here: 1. Interrupts were breaking Berkeley DB. 2. Logging of Avro and Flume to the Flume Appender causes and endless loop of logging events. As a consequence the Flume Appender now filters out all org.apache.flume and org.apache.avro events.
          Hide
          Ralph Goers added a comment -

          I ran the provided test and was able to duplicate the problem. However I also got the following stack trace that seems to indicate that the problem is that the Netty thread is logging to Flume and is getting a thread interruption, most likely for the socket connection, and the first thing to detect it is Berkeley DB. I'm considering ways to deal with this but this can easily be avoided by routing events from Avro to some other Appender.

          2013-06-25 15:44:16,836 ERROR An exception occurred processing Appender FlumeAppender org.apache.logging.log4j.LoggingException: Exception occurred writing log event
          at org.apache.logging.log4j.flume.appender.FlumePersistentManager.send(FlumePersistentManager.java:176)
          at org.apache.logging.log4j.flume.appender.FlumeAppender.append(FlumeAppender.java:86)
          at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:102)
          at org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:424)
          at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:405)
          at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:366)
          at org.apache.logging.log4j.core.Logger.log(Logger.java:110)
          at org.apache.logging.log4j.spi.AbstractLoggerWrapper.log(AbstractLoggerWrapper.java:55)
          at org.slf4j.impl.SLF4JLogger.debug(SLF4JLogger.java:139)
          at org.apache.avro.ipc.NettyTransceiver$NettyClientAvroHandler.handleUpstream(NettyTransceiver.java:491)
          at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
          at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:792)
          at org.jboss.netty.channel.SimpleChannelUpstreamHandler.channelUnbound(SimpleChannelUpstreamHandler.java:203)
          at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:100)
          at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
          at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
          at org.jboss.netty.channel.Channels.fireChannelUnbound(Channels.java:436)
          at org.jboss.netty.channel.Channels$5.run(Channels.java:424)
          at org.jboss.netty.channel.socket.ChannelRunnableWrapper.run(ChannelRunnableWrapper.java:41)
          at org.jboss.netty.channel.socket.nio.AbstractNioWorker.processEventQueue(AbstractNioWorker.java:352)
          at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:236)
          at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:38)
          at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
          at java.lang.Thread.run(Thread.java:722)
          Caused by: com.sleepycat.je.ThreadInterruptedException: (JE 5.0.73) Environment must be closed, caused by: com.sleepycat.je.ThreadInterruptedException: Environment invalid because of previous exception: (JE 5.0.73) /tmp/castellan-reader-berkeley-db Thread interrupted prior to logging the commit THREAD_INTERRUPTED: InterruptedException may cause incorrect internal state, unable to continue. Environment is invalid and must be closed.
          at com.sleepycat.je.ThreadInterruptedException.wrapSelf(ThreadInterruptedException.java:99)
          at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1512)
          at com.sleepycat.je.Transaction.checkEnv(Transaction.java:850)
          at com.sleepycat.je.Transaction.abort(Transaction.java:204)
          at org.apache.logging.log4j.flume.appender.FlumePersistentManager.send(FlumePersistentManager.java:171)
          ... 25 more
          Caused by: com.sleepycat.je.ThreadInterruptedException: Environment invalid because of previous exception: (JE 5.0.73) /tmp/castellan-reader-berkeley-db Thread interrupted prior to logging the commit THREAD_INTERRUPTED: InterruptedException may cause incorrect internal state, unable to continue. Environment is invalid and must be closed.
          at com.sleepycat.je.txn.Txn.preLogCommitCheck(Txn.java:1006)
          at com.sleepycat.je.txn.Txn.logCommitEntry(Txn.java:952)
          at com.sleepycat.je.txn.Txn.commit(Txn.java:719)
          at com.sleepycat.je.txn.Txn.commit(Txn.java:584)
          at com.sleepycat.je.Transaction.commit(Transaction.java:317)
          at org.apache.logging.log4j.flume.appender.FlumePersistentManager.send(FlumePersistentManager.java:167)
          ... 25 more

          Show
          Ralph Goers added a comment - I ran the provided test and was able to duplicate the problem. However I also got the following stack trace that seems to indicate that the problem is that the Netty thread is logging to Flume and is getting a thread interruption, most likely for the socket connection, and the first thing to detect it is Berkeley DB. I'm considering ways to deal with this but this can easily be avoided by routing events from Avro to some other Appender. 2013-06-25 15:44:16,836 ERROR An exception occurred processing Appender FlumeAppender org.apache.logging.log4j.LoggingException: Exception occurred writing log event at org.apache.logging.log4j.flume.appender.FlumePersistentManager.send(FlumePersistentManager.java:176) at org.apache.logging.log4j.flume.appender.FlumeAppender.append(FlumeAppender.java:86) at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:102) at org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:424) at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:405) at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:366) at org.apache.logging.log4j.core.Logger.log(Logger.java:110) at org.apache.logging.log4j.spi.AbstractLoggerWrapper.log(AbstractLoggerWrapper.java:55) at org.slf4j.impl.SLF4JLogger.debug(SLF4JLogger.java:139) at org.apache.avro.ipc.NettyTransceiver$NettyClientAvroHandler.handleUpstream(NettyTransceiver.java:491) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:792) at org.jboss.netty.channel.SimpleChannelUpstreamHandler.channelUnbound(SimpleChannelUpstreamHandler.java:203) at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:100) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) at org.jboss.netty.channel.Channels.fireChannelUnbound(Channels.java:436) at org.jboss.netty.channel.Channels$5.run(Channels.java:424) at org.jboss.netty.channel.socket.ChannelRunnableWrapper.run(ChannelRunnableWrapper.java:41) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.processEventQueue(AbstractNioWorker.java:352) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:236) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:38) at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) Caused by: com.sleepycat.je.ThreadInterruptedException: (JE 5.0.73) Environment must be closed, caused by: com.sleepycat.je.ThreadInterruptedException: Environment invalid because of previous exception: (JE 5.0.73) /tmp/castellan-reader-berkeley-db Thread interrupted prior to logging the commit THREAD_INTERRUPTED: InterruptedException may cause incorrect internal state, unable to continue. Environment is invalid and must be closed. at com.sleepycat.je.ThreadInterruptedException.wrapSelf(ThreadInterruptedException.java:99) at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1512) at com.sleepycat.je.Transaction.checkEnv(Transaction.java:850) at com.sleepycat.je.Transaction.abort(Transaction.java:204) at org.apache.logging.log4j.flume.appender.FlumePersistentManager.send(FlumePersistentManager.java:171) ... 25 more Caused by: com.sleepycat.je.ThreadInterruptedException: Environment invalid because of previous exception: (JE 5.0.73) /tmp/castellan-reader-berkeley-db Thread interrupted prior to logging the commit THREAD_INTERRUPTED: InterruptedException may cause incorrect internal state, unable to continue. Environment is invalid and must be closed. at com.sleepycat.je.txn.Txn.preLogCommitCheck(Txn.java:1006) at com.sleepycat.je.txn.Txn.logCommitEntry(Txn.java:952) at com.sleepycat.je.txn.Txn.commit(Txn.java:719) at com.sleepycat.je.txn.Txn.commit(Txn.java:584) at com.sleepycat.je.Transaction.commit(Transaction.java:317) at org.apache.logging.log4j.flume.appender.FlumePersistentManager.send(FlumePersistentManager.java:167) ... 25 more

            People

            • Assignee:
              Unassigned
              Reporter:
              Edward Sargisson
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development