Uploaded image for project: 'Flume'
  1. Flume
  2. FLUME-2905

NetcatSource - Socket not closed when an exception is encountered during start() leading to file descriptor leaks

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.6.0
    • 1.8.0
    • Sinks+Sources
    • None

    Description

      During the flume agent start-up, the flume configuration containing the NetcatSource is parsed and the source's start() is called. If there is an issue while binding the channel's socket to a local address to configure the socket to listen for connections following exception is thrown but the socket open just before is not closed.

      2016-05-01 03:04:37,273 ERROR org.apache.flume.lifecycle.LifecycleSupervisor: Unable to start EventDrivenSourceRunner: { source:org.apache.flume.source.NetcatSource{name:src-1,state:IDLE} } - Exception follows.
      org.apache.flume.FlumeException: java.net.BindException: Address already in use
              at org.apache.flume.source.NetcatSource.start(NetcatSource.java:173)
              at org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44)
              at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
              at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
              at java.lang.Thread.run(Thread.java:745)
      Caused by: java.net.BindException: Address already in use
              at sun.nio.ch.Net.bind0(Native Method)
              at sun.nio.ch.Net.bind(Net.java:444)
              at sun.nio.ch.Net.bind(Net.java:436)
              at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
              at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
              at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67)
              at org.apache.flume.source.NetcatSource.start(NetcatSource.java:167)
              ... 9 more
      

      The source's start() is then called again leading to another socket being opened but not closed and so on. This leads to file descriptor (socket) leaks.

      This can be easily reproduced as follows:
      1. Set Netcat as the source in flume agent configuration.
      2. Set the bind port for the netcat source to a port which is already in use. e.g. in my case I used 50010 which is the port for DataNode's XCeiver Protocol in use by the HDFS service.
      3. Start flume agent and perform "lsof -p <flume_process_id> | wc -l". Notice the file descriptors keep on growing due to socket leaks with errors like: "can't identify protocol".

      Attachments

        1. FLUME-2905-0.patch
          4 kB
          Siddharth Ahuja
        2. FLUME-2905-1.patch
          4 kB
          Siddharth Ahuja
        3. FLUME-2905-2.patch
          4 kB
          Siddharth Ahuja
        4. FLUME-2905-3.patch
          4 kB
          Siddharth Ahuja
        5. FLUME-2905-4.patch
          4 kB
          Siddharth Ahuja
        6. FLUME-2905-5.patch
          4 kB
          Siddharth Ahuja
        7. FLUME-2905-6.patch
          4 kB
          Siddharth Ahuja

        Activity

          People

            sahuja Siddharth Ahuja
            sahuja Siddharth Ahuja
            Votes:
            1 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: