Flume
  1. Flume
  2. FLUME-2160

SpoolDirectorySource uncaught exception

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: v1.3.0
    • Fix Version/s: None
    • Component/s: Sinks+Sources
    • Labels:
      None
    • Environment:

      Linux 2.6.32-358.14.1.el6.x86_64 #1 SMP Tue Jul 16 23:51:20 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

      Description

      Ive noticed some instabilities with Flume, of particular is this uncaught exception. One of the bigger challenges with this type of exception, is that the agent is now in a hung state repeating the same exception. The tooling for inspecting what file/event is causing the issue are lacking or non-existant, and architecturally there should be some functionality equivalant to the "dead-letter-queue". Here is the exception Im dealing with now:

      13/08/14 15:37:05 ERROR source.SpoolDirectorySource: Uncaught exception in Runnable
      java.lang.IllegalStateException: Serializer has been closed
      at org.apache.flume.serialization.LineDeserializer.ensureOpen(LineDeserializer.java:124)
      at org.apache.flume.serialization.LineDeserializer.readEvents(LineDeserializer.java:88)
      at org.apache.flume.client.avro.ReliableSpoolingFileEventReader.readEvents(ReliableSpoolingFileEventReader.java:221)
      at org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:154)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
      at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
      at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      at java.lang.Thread.run(Thread.java:662)

        Activity

        Hide
        Ted Malaska added a comment -

        Hey Gapinathan,

        Thanks for linking these together. That would be a great Jira to fix.

        I'm open to fixing this next. Let me check if anyone else is working on this and 2119

        Show
        Ted Malaska added a comment - Hey Gapinathan, Thanks for linking these together. That would be a great Jira to fix. I'm open to fixing this next. Let me check if anyone else is working on this and 2119
        Hide
        Gopinathan A added a comment -

        One scenario could be spool directory containing file with same name which is already in COMPLETED state.

        I feel this issue is duplicate of FLUME-2119

        Show
        Gopinathan A added a comment - One scenario could be spool directory containing file with same name which is already in COMPLETED state. I feel this issue is duplicate of FLUME-2119
        Hide
        Ted Malaska added a comment -

        OK I've written the code to rename the file to a failed suffix. But before I finished the Junits I notice that this error in this jira is related to the close() function is being called on the currentFile.get().getDeserializer() then a readnext call is being called.

        From looking at the current code I do not see how this could happen. Unless close() method on ReliableSpoolingFileEventReader from a different thread.

        So Rob you may find that moving to the newest version of flume may solve your problem.

        My fix that I intend to add will not fix the core issue you are having. It will only prevent it from repeating over and over.

        I will submit that patch and I will close this bug with that. When you test this on a new version of Flume let us know if this "java.lang.IllegalStateException: Serializer has been closed" continues to happen.

        If if does we can make you a patch that will add some debugging to figure out what is happening on your system.

        Show
        Ted Malaska added a comment - OK I've written the code to rename the file to a failed suffix. But before I finished the Junits I notice that this error in this jira is related to the close() function is being called on the currentFile.get().getDeserializer() then a readnext call is being called. From looking at the current code I do not see how this could happen. Unless close() method on ReliableSpoolingFileEventReader from a different thread. So Rob you may find that moving to the newest version of flume may solve your problem. My fix that I intend to add will not fix the core issue you are having. It will only prevent it from repeating over and over. I will submit that patch and I will close this bug with that. When you test this on a new version of Flume let us know if this "java.lang.IllegalStateException: Serializer has been closed" continues to happen. If if does we can make you a patch that will add some debugging to figure out what is happening on your system.
        Hide
        Ted Malaska added a comment -

        OK I've talked to a committer and I'm ready to implement a ticket.

        Downloading the code now and maybe tonight or tomorrow I should have something to submit

        Show
        Ted Malaska added a comment - OK I've talked to a committer and I'm ready to implement a ticket. Downloading the code now and maybe tonight or tomorrow I should have something to submit
        Hide
        Ted Malaska added a comment -

        I've read through the code and this is a summary of my first design.

        In readEvent(int numEvent) method of ReliableSpoopingFileEventReader a try catch block
        should be around line 234.

        currentFile.get().getDeserializer().readEvents(numEvents);

        If there is an exception then the following should happen:
        1) A error message gets logged with the following information
        1.1) The file name and path
        1.2) The start event before the readEvent method was called and the number of events that were requested on being read
        1.3) The IO Exception from the method
        2) The file should be moved to a fileFailedSuffix
        2.1) Reuse the logic in the rollCurrentFile to get code reuse

        Question to Flume people: Is there a reason why the spoolDirectorySource uses suffixes over multiple directories.

        Show
        Ted Malaska added a comment - I've read through the code and this is a summary of my first design. In readEvent(int numEvent) method of ReliableSpoopingFileEventReader a try catch block should be around line 234. currentFile.get().getDeserializer().readEvents(numEvents); If there is an exception then the following should happen: 1) A error message gets logged with the following information 1.1) The file name and path 1.2) The start event before the readEvent method was called and the number of events that were requested on being read 1.3) The IO Exception from the method 2) The file should be moved to a fileFailedSuffix 2.1) Reuse the logic in the rollCurrentFile to get code reuse Question to Flume people: Is there a reason why the spoolDirectorySource uses suffixes over multiple directories.
        Hide
        Ted Malaska added a comment -

        Is anyone working on this one? If not I would love to give it a try.

        The week after this should be slow for me.

        Show
        Ted Malaska added a comment - Is anyone working on this one? If not I would love to give it a try. The week after this should be slow for me.

          People

          • Assignee:
            Unassigned
            Reporter:
            Rob
          • Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:

              Development