Flume
  1. Flume
  2. FLUME-697

ExecNioSource has an unbounded queue that can cause OOME.

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: v0.9.4
    • Fix Version/s: v0.9.5
    • Component/s: Sinks+Sources
    • Labels:
      None

      Description

      Similar to the rpc sources, there is an unbounded queue in the implementation of ExecNioSource that can get saturated and cause an OOME.

        Issue Links

          Activity

          Hide
          Jonathan Hsieh added a comment -

          If one runs:

          $ bin/flume node_nowatch -n node -s -1 -c 'node:exec("yes") | delay(1000) console; '

          we get in a situation where the yes program generates a lot of events (body = 'y') that get produced and queued by the exec source while only one event is consumed per second (delaying 1000ms before outputting to console).

          After about 20-30s this shows up:

          grimlock [INFO Sat Jul 02 15:28:46 PDT 2011]

          { execcmd : yes }

          { procsource : STDOUT }

          { service : 1702389091 'exec' }

          y
          Exception in thread "ReaderThread (yes-STDOUT)" java.lang.OutOfMemoryError: Java heap space
          at java.nio.CharBuffer.wrap(CharBuffer.java:350)
          at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:238)
          at java.lang.StringCoding.encode(StringCoding.java:272)
          at java.lang.StringCoding.encode(StringCoding.java:284)
          at java.lang.String.getBytes(String.java:986)
          at com.cloudera.flume.core.Attributes.setString(Attributes.java:112)
          at com.cloudera.flume.handlers.exec.ExecNioSource.buildExecEvent(ExecNioSource.java:118)
          at com.cloudera.flume.handlers.exec.ExecNioSource.extractLines(ExecNioSource.java:166)
          at com.cloudera.flume.handlers.exec.ExecNioSource$ReaderThread.doLineMode(ExecNioSource.java:274)
          at com.cloudera.flume.handlers.exec.ExecNioSource$ReaderThread.run(ExecNioSource.java:411)
          grimlock [INFO Sat Jul 02 15:28:46 PDT 2011]

          { execcmd : yes }

          { procsource : STDOUT }

          { service : 1702389091 'exec' }

          y
          2011-07-02 15:31:48,261 [Thread-2] ERROR util.InputStreamPipe: Input stream pipe closed
          java.io.IOException: Broken pipe
          at sun.nio.ch.FileDispatcher.write0(Native Method)
          at sun.nio.ch.FileDispatcher.write(FileDispatcher.java:39)
          at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:72)
          at sun.nio.ch.IOUtil.write(IOUtil.java:43)
          at sun.nio.ch.SinkChannelImpl.write(SinkChannelImpl.java:149)
          at com.cloudera.util.InputStreamPipe$CopyThread.run(InputStreamPipe.java:112)
          grimlock [INFO Sat Jul 02 15:28:46 PDT 2011]

          { execcmd : yes }

          { procsource : STDOUT }

          { service : 1702389091 'exec' }

          y

          Show
          Jonathan Hsieh added a comment - If one runs: $ bin/flume node_nowatch -n node -s -1 -c 'node:exec("yes") | delay(1000) console; ' we get in a situation where the yes program generates a lot of events (body = 'y') that get produced and queued by the exec source while only one event is consumed per second (delaying 1000ms before outputting to console). After about 20-30s this shows up: grimlock [INFO Sat Jul 02 15:28:46 PDT 2011] { execcmd : yes } { procsource : STDOUT } { service : 1702389091 'exec' } y Exception in thread "ReaderThread (yes-STDOUT)" java.lang.OutOfMemoryError: Java heap space at java.nio.CharBuffer.wrap(CharBuffer.java:350) at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:238) at java.lang.StringCoding.encode(StringCoding.java:272) at java.lang.StringCoding.encode(StringCoding.java:284) at java.lang.String.getBytes(String.java:986) at com.cloudera.flume.core.Attributes.setString(Attributes.java:112) at com.cloudera.flume.handlers.exec.ExecNioSource.buildExecEvent(ExecNioSource.java:118) at com.cloudera.flume.handlers.exec.ExecNioSource.extractLines(ExecNioSource.java:166) at com.cloudera.flume.handlers.exec.ExecNioSource$ReaderThread.doLineMode(ExecNioSource.java:274) at com.cloudera.flume.handlers.exec.ExecNioSource$ReaderThread.run(ExecNioSource.java:411) grimlock [INFO Sat Jul 02 15:28:46 PDT 2011] { execcmd : yes } { procsource : STDOUT } { service : 1702389091 'exec' } y 2011-07-02 15:31:48,261 [Thread-2] ERROR util.InputStreamPipe: Input stream pipe closed java.io.IOException: Broken pipe at sun.nio.ch.FileDispatcher.write0(Native Method) at sun.nio.ch.FileDispatcher.write(FileDispatcher.java:39) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:72) at sun.nio.ch.IOUtil.write(IOUtil.java:43) at sun.nio.ch.SinkChannelImpl.write(SinkChannelImpl.java:149) at com.cloudera.util.InputStreamPipe$CopyThread.run(InputStreamPipe.java:112) grimlock [INFO Sat Jul 02 15:28:46 PDT 2011] { execcmd : yes } { procsource : STDOUT } { service : 1702389091 'exec' } y
          Hide
          Jonathan Hsieh added a comment - - edited

          After bounding the queue, the same process remains up after 30mins. (and then I killed the process)

          Show
          Jonathan Hsieh added a comment - - edited After bounding the queue, the same process remains up after 30mins. (and then I killed the process)
          Hide
          Jonathan Hsieh added a comment -
          Show
          Jonathan Hsieh added a comment - review here https://review.cloudera.org/r/1859/
          Hide
          Jonathan Hsieh added a comment -

          committted

          Show
          Jonathan Hsieh added a comment - committted

            People

            • Assignee:
              Jonathan Hsieh
              Reporter:
              Jonathan Hsieh
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development