Flume
  1. Flume
  2. FLUME-1175

RollingFileSink complains of Bad File Descriptor upon a reconfig event

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: v1.2.0
    • Fix Version/s: v1.4.0
    • Component/s: Sinks+Sources
    • Labels:
      None
    • Environment:

      CentOS 6.2 64-bit

      Description

      Steps:
      1) Create a config file that looks something like this:
      agent.channels = c1
      agent.sources = r1
      agent.sinks = k1
      #
      agent.channels.c1.type = MEMORY
      #
      agent.sources.r1.channels = c1
      agent.sources.r1.type = SEQ
      #
      agent.sinks.k1.channel = c1
      agent.sinks.k1.type = FILE_ROLL
      agent.sinks.k1.sink.directory = /var/log/flume-ng
      agent.sinks.k1.sink.rollInterval = 0

      2) Start the Flume NG agent

      3) touch the config file so that a reconfig event is triggered within 30 secs

      4) tail the output file to observer the sequence generator events:
      tail -f /var/log/flume-ng/XXXXXXXXXXXX

      5) Notice that the flow suddenly stops at the reconfig event (within 30 secs after touching the config file). Flow doesn't continue. The flume log shows a Bad File Descriptor error for the RollingFileSink:

      2012-05-03 01:34:34,806 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:160)] Unable to deliver event. Exception follows.
      org.apache.flume.EventDeliveryException: Failed to process event: [Event headers =

      {timestamp=1336034074797, nanos=3762297996593382, pri=INFO, host=<mysupersecrethost>, FlumeOG=yes, execcmd=java.nio.HeapByteBuffer[pos=0 lim=24 cap=24], procsource=java.nio.HeapByteBuffer[pos=0 lim=6 cap=6], service=java.nio.HeapByteBuffer[pos=0 lim=4 cap=4]}

      , body.length = 26 ]
      at org.apache.flume.sink.RollingFileSink.process(RollingFileSink.java:201)
      at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
      at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
      at java.lang.Thread.run(Thread.java:662)
      Caused by: java.io.IOException: Bad file descriptor
      at java.io.FileOutputStream.writeBytes(Native Method)
      at java.io.FileOutputStream.write(FileOutputStream.java:282)
      at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
      at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
      at org.apache.flume.sink.RollingFileSink.process(RollingFileSink.java:193)
      ... 3 more

        Issue Links

          Activity

          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Patch Available Patch Available
          214d 12h 11m 1 Roshan Naik 03/Dec/12 21:52
          Patch Available Patch Available Resolved Resolved
          3d 18h 21m 1 Brock Noland 07/Dec/12 16:13
          Hide
          Hudson added a comment -

          Integrated in flume-trunk #337 (See https://builds.apache.org/job/flume-trunk/337/)
          FLUME-1175: RollingFileSink complains of Bad File Descriptor upon a reconfig event (Revision e858adee407bf76cf2a036433ed4173714505924)

          Result = SUCCESS
          brock : http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=e858adee407bf76cf2a036433ed4173714505924
          Files :

          • flume-ng-core/src/main/java/org/apache/flume/sink/RollingFileSink.java
          Show
          Hudson added a comment - Integrated in flume-trunk #337 (See https://builds.apache.org/job/flume-trunk/337/ ) FLUME-1175 : RollingFileSink complains of Bad File Descriptor upon a reconfig event (Revision e858adee407bf76cf2a036433ed4173714505924) Result = SUCCESS brock : http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=e858adee407bf76cf2a036433ed4173714505924 Files : flume-ng-core/src/main/java/org/apache/flume/sink/RollingFileSink.java
          Brock Noland made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          Brock Noland added a comment -

          Thank you for the patch Roshan! I have committed this to trunk and 1.4!

          Show
          Brock Noland added a comment - Thank you for the patch Roshan! I have committed this to trunk and 1.4!
          Hide
          Brock Noland added a comment -

          Roshan,

          Regarding the logging the exception. Yes that is a deeper topic. I don't think our semantics around this are clear. Additionally, lets say you have 5 flows in agent, do you want one of them to kill the whole agent? This are things which aren't perfectly clear. Without a clear statement of how things should work I think what we should do is catch and log anything inside of the stop/configure/start.

          Show
          Brock Noland added a comment - Roshan, Regarding the logging the exception. Yes that is a deeper topic. I don't think our semantics around this are clear. Additionally, lets say you have 5 flows in agent, do you want one of them to kill the whole agent? This are things which aren't perfectly clear. Without a clear statement of how things should work I think what we should do is catch and log anything inside of the stop/configure/start.
          Roshan Naik made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Roshan Naik added a comment -

          nulling out outputStream in RollingFileSink.stop()
          removing superfluous calls to outputStream.flush()

          Show
          Roshan Naik added a comment - nulling out outputStream in RollingFileSink.stop() removing superfluous calls to outputStream.flush()
          Roshan Naik made changes -
          Remote Link This issue links to "Review Board Link (Web Link)" [ 11527 ]
          Roshan Naik made changes -
          Attachment FLUME-1175.patch [ 12555560 ]
          Hide
          Roshan Naik added a comment -
          • nulling out outputStream in RollingFileSink.stop()
          • removing superfluous calls to outputStream.flush()
          Show
          Roshan Naik added a comment - nulling out outputStream in RollingFileSink.stop() removing superfluous calls to outputStream.flush()
          Hide
          Roshan Naik added a comment -

          maybe that issue is a deeper topic than this jira ?

          Show
          Roshan Naik added a comment - maybe that issue is a deeper topic than this jira ?
          Hide
          Roshan Naik added a comment -

          ok.. i have verified that null-ing it out is the right solution.

          One other change i wanted to get opinions on before submitting a patch

          The catch clauses in stop() are swallowing any exception from flush() or close(). But it seems to me that if flush()/close() fails, in the reconfig scenario, an exception should be thrown and let flume to die. In a normal shutdown case this is not necessary though. However the stop() does not allow exceptions to be thrown.

          Show
          Roshan Naik added a comment - ok.. i have verified that null-ing it out is the right solution. One other change i wanted to get opinions on before submitting a patch The catch clauses in stop() are swallowing any exception from flush() or close(). But it seems to me that if flush()/close() fails, in the reconfig scenario, an exception should be thrown and let flume to die. In a normal shutdown case this is not necessary though. However the stop() does not allow exceptions to be thrown.
          Roshan Naik made changes -
          Assignee Roshan Naik [ roshan_naik ]
          Hide
          Will McQueen added a comment -

          Hi Roshan,

          Not at the moment, but if you have some time to take it up then that's fine
          with me.

          Cheers,
          Will

          Show
          Will McQueen added a comment - Hi Roshan, Not at the moment, but if you have some time to take it up then that's fine with me. Cheers, Will
          Hide
          Roshan Naik added a comment -

          Will McQueen, are you working on this ?

          Show
          Roshan Naik added a comment - Will McQueen, are you working on this ?
          Brock Noland made changes -
          Fix Version/s v1.4.0 [ 12323372 ]
          Fix Version/s v1.3.0 [ 12322140 ]
          Mike Percy made changes -
          Fix Version/s v1.3.0 [ 12322140 ]
          Fix Version/s v1.2.0 [ 12320243 ]
          Will McQueen made changes -
          Field Original Value New Value
          Priority Major [ 3 ] Critical [ 2 ]
          Hide
          Will McQueen added a comment - - edited

          I'm not sure, but what I think may be happening is:
          1) Reconfig event occurs
          2) RollingFileSink finishes executing its process() method, or maybe the sink runner thread that's calling RollingFileSink.process gets interrupted somewhere in the middle of the process() call. Either way, I believe in these cases that it's possible that the process() method will return while the outputStream field still has a non-null value pointing to some BufferedOutputStreamObject
          3) As part of the reconfig steps, the sink's configure() is called next. And configure() sets the 'directory' field, but outputStream field is still set to the old value.
          4) Next, stop() is called (again as part of reconfig steps). outputStream field's object is flushed and closed, but not nulled-out afterwards.
          5) Next, start() is called. Reconfig steps are done.
          6) Next, sink runner calls process(), which checks if outputStream != null (which is true, since outputStream is pointing to the old, closed BufferedOutputStream).

          So maybe one possible fix could be to insert a line in stop() to null-out the outputStream field (and also null-out the serializer field while we're at it)... something like this:

          if (serializer != null) {
            try {
              serializer.flush();
              serializer.beforeClose();
            } catch (IOException e) {
              logger.error("Unable to cleanup serializer. Exception follows.", e);
            } finally {
              serializer = null;
            }
          }
          
          if (outputStream != null) {
            logger.debug("Closing file {}", pathController.getCurrentFile());
            try {
              outputStream.flush();
              outputStream.close();
            } catch (IOException e) {
              logger.error("Unable to close outputStream. Exception follows.", e);
            } finally {
              outputStream = null;
            }
          }
          
          Show
          Will McQueen added a comment - - edited I'm not sure, but what I think may be happening is: 1) Reconfig event occurs 2) RollingFileSink finishes executing its process() method, or maybe the sink runner thread that's calling RollingFileSink.process gets interrupted somewhere in the middle of the process() call. Either way, I believe in these cases that it's possible that the process() method will return while the outputStream field still has a non-null value pointing to some BufferedOutputStreamObject 3) As part of the reconfig steps, the sink's configure() is called next. And configure() sets the 'directory' field, but outputStream field is still set to the old value. 4) Next, stop() is called (again as part of reconfig steps). outputStream field's object is flushed and closed, but not nulled-out afterwards. 5) Next, start() is called. Reconfig steps are done. 6) Next, sink runner calls process(), which checks if outputStream != null (which is true, since outputStream is pointing to the old, closed BufferedOutputStream). So maybe one possible fix could be to insert a line in stop() to null-out the outputStream field (and also null-out the serializer field while we're at it)... something like this: if (serializer != null ) { try { serializer.flush(); serializer.beforeClose(); } catch (IOException e) { logger.error( "Unable to cleanup serializer. Exception follows." , e); } finally { serializer = null ; } } if (outputStream != null ) { logger.debug( "Closing file {}" , pathController.getCurrentFile()); try { outputStream.flush(); outputStream.close(); } catch (IOException e) { logger.error( "Unable to close outputStream. Exception follows." , e); } finally { outputStream = null ; } }
          Will McQueen created issue -

            People

            • Assignee:
              Roshan Naik
              Reporter:
              Will McQueen
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development