Uploaded image for project: 'Log4j 2'
  1. Log4j 2
  2. LOG4J2-1076

Flume appender fails to perform

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Abandoned
    • None
    • None
    • None
    • None

    Description

      Recently I was testing log4j2-flume appender performance and thought to share the results as I believe they reflect some bugs/flaws in the current implementation. I conducted a series of in which I used the same 2.2 GB base file. I wrote a small java application that read he file line by line and log each line log4j2 with flume appender that sends it to another flume instance on a remote machine. I measured the time it took and traffic between the end points as I was mainly curious in the avro compression abilities.

      First, Log4j2 Flumes' appender support only GZIP compression (and only for the body) so first I was curious if this feature actually compatible with the flume's defalte compression method for avro. I found out that it didn't, and in order to make it work I would have to write my own gzipDecoder on the other side.

      Type Avro:
      1. The process of sending the logs was VERY VERY long (over 2 hours), and crushed several times.
      2. The more surprising part was that the traffic measured on the link was over 2G (when I used GZIP compression), and even closer to 3G (without compression). I am not even sure why there was such an overhead, but that’s what I saw several times.
      3. The event were send one by one even when I defined a batch mode of 1000. After reading the code a little bit, I found out that batch mode is currently not supported and will might be possible on the next release - https://issues.apache.org/jira/browse/LOG4J2-1044?jql=project%20%3D%20LOG4J2%20AND%20priority%20%3D%20Major%20AND%20resolution%20%3D%20Unresolved%20AND%20text%20~%20%22flume%22%20ORDER%20BY%20key%20DESC.

      Type Persistent:
      1. Crushed several times after a little while and stopped sending messages. It didn’t look like the flume instance was the one that crushes so my guess is that the BerkelyDB thread or something like that. I could not figured out what exactly.
      2. During the same time it crushed (which seems pretty connected to the issue) I got alerts regarding IO stating Disk IO > 90%. Again, not sure why it happened, but it happened on several occasions and only when I tried the persistent type.
      3. Batch mode though worked.

      Type Embedded:
      1. The documentation in log4j website about it does not reflect the the way to configure this type. I had to work my way through the errors until I got my code to run, and even then it didn’t really seem like it sends anything. Not sure why, and I probably need to look deeper into it.

      Since Avro type is the only one that seems to work without a significant crush, I tested this mode of operation by adding a local Flume which get the data from the log4j2 appender and ship it to the remote Flume using deflate compression. Using this setup it took 1276484 ms ~ 21 Minutes.

      Another important thing I wanted to point out is once I removed all the appenders, it took only 10781 ms (about 10-11 seconds) to read the file. With file appender it took 99682 ms (about 1.5 minutes). So the performance drawback when using the flume appender seems pretty huge, but it can probably be reduced using the async logger mode.

      Attachments

        Activity

          People

            Unassigned Unassigned
            tezra tzachi
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: