Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
1.4.0, 1.5.0, 1.6.0
-
None
Description
It's possible for a client submitting syslog data which is malformed in various ways to convince SyslogUtils.extractEvent to continually fill the ByteArrayOutputStream it uses to collect the event until the agent runs out of memory. Since the OOM condition affects the whole agent, it's possible that a client sending such data (due to accident or malicious intent) to disable the agent, as long as it remains connected.
Note that this is probably only possible using SyslogTcpSource although the fix touches common code in SyslogUtils.java.
The issue can happen in two ways:
Scenario 1: Send a message like this:
<> some more stuff here
This causes a NumberFormatException:
Sep 11, 2015 2:27:07 AM org.jboss.netty.channel.SimpleChannelHandler WARNING: EXCEPTION, please implement org.apache.flume.source.SyslogTcpSource$syslogTcpHandler.exceptionCaught() for proper handling. java.lang.NumberFormatException: For input string: "" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Integer.parseInt(Integer.java:504) at java.lang.Integer.parseInt(Integer.java:527) at org.apache.flume.source.SyslogUtils.buildEvent(SyslogUtils.java:198) at org.apache.flume.source.SyslogUtils.extractEvent(SyslogUtils.java:344) at org.apache.flume.source.SyslogTcpSource$syslogTcpHandler.messageReceived(SyslogTcpSource.java:76) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:94) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.processSelectedKeys(AbstractNioWorker.java:364) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:238) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:38) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)
This exception does not get handled, and it happens before reset() can be called. The result is that the state machine in SyslogUtils gets stuck in the DATA state, and all subsequent data just gets appended to the baos, while the above exception streams to the log. Eventually the agent runs out of memory.
Scenario 2: Send some data like this:
<123...........
No length checking is done in the PRIO state so you could potentially fill the agent memory this way too.
I'm attaching a patch which handles both of these issues and adds more exception handling to buildEvent to make sure that reset() is called in future unforeseen situations.
Thanks also to roshan_naik for helping to make this patch better.