Flume
  1. Flume
  2. FLUME-2119

duplicate files cause flume to enter irrecoverable state

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: v1.4.0
    • Fix Version/s: v1.5.0
    • Component/s: Sinks+Sources
    • Labels:
      None

      Description

      If a spoolingdir receives FileA, after it is picked up by Flume and renamed to FileA.COMPLETED placing another file of the same original name (FileA) will cause Flume to log an IllegalStateException indefinitely. This is likely due to Flume attempting to rename the second FileA to FileA.COMPLETED, but finding that the file already exists.

      When Flume has entered this state, it can only be recovered by removing the .COMPLETED file from the directory and restarting the agent.

      Log message looks like this:

      02 Jul 2013 21:32:09,371 ERROR [pool-4-thread-1] (org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run:164) - Uncaught exception in Runnable
      java.lang.IllegalStateException: Serializer has been closed
      at org.apache.flume.serialization.LineDeserializer.ensureOpen(LineDeserializer.java:124)
      at org.apache.flume.serialization.LineDeserializer.readEvents(LineDeserializer.java:88)
      at org.apache.flume.client.avro.ReliableSpoolingFileEventReader.readEvents(ReliableSpoolingFileEventReader.java:221)
      at org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:154)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
      at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
      at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      at java.lang.Thread.run(Thread.java:662)

        Issue Links

          Activity

          Hide
          Gopinathan A added a comment -

          any update on this jira?

          Show
          Gopinathan A added a comment - any update on this jira?
          Phil Scala made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Hide
          Phil Scala added a comment -

          Patch needs some more thought, re-opening

          Show
          Phil Scala added a comment - Patch needs some more thought, re-opening
          Phil Scala made changes -
          Attachment FLUME-2119-0.patch [ 12608805 ]
          Phil Scala made changes -
          Attachment FLUME-2119-0.patch [ 12608784 ]
          Phil Scala made changes -
          Status In Progress [ 3 ] Patch Available [ 10002 ]
          Affects Version/s v1.4.0 [ 12323372 ]
          Fix Version/s v1.5.0 [ 12324642 ]
          Phil Scala made changes -
          Attachment FLUME-2119-0.patch [ 12608784 ]
          Hide
          Phil Scala added a comment -

          Initial patch shared for review. A review board posting will soon follow.

          this patch simply uses a configuration setting to turn off/on IlligalStateExceptions from within the ReliableSpoolingFileEventReader.

          Notes: Flume 1.5 does not go into the spiral of events when a duplicate file is spooled, it will log one error indicating the need to restart the agent.

          However this patch can still provide value if you do not want to the agent to halt under such conditions.

          Second note: if the completed file cannot be deleted then an exception is still thrown, else flume is stuck in a loop parsing the same file over and over. Throwing the exception seemed like a better choice.

          Show
          Phil Scala added a comment - Initial patch shared for review. A review board posting will soon follow. this patch simply uses a configuration setting to turn off/on IlligalStateExceptions from within the ReliableSpoolingFileEventReader. Notes: Flume 1.5 does not go into the spiral of events when a duplicate file is spooled, it will log one error indicating the need to restart the agent. However this patch can still provide value if you do not want to the agent to halt under such conditions. Second note: if the completed file cannot be deleted then an exception is still thrown, else flume is stuck in a loop parsing the same file over and over. Throwing the exception seemed like a better choice.
          Phil Scala made changes -
          Status Open [ 1 ] In Progress [ 3 ]
          Phil Scala made changes -
          Assignee Phil Scala [ scaph01 ]
          Hide
          Phil Scala added a comment -

          I have a patch, working on unit tests, however having issues on my Windows machine + Eclipse not behaving ... building an Ubuntu dev vm now...

          Show
          Phil Scala added a comment - I have a patch, working on unit tests, however having issues on my Windows machine + Eclipse not behaving ... building an Ubuntu dev vm now...
          Phil Scala made changes -
          Link This issue Is contained by FLUME-2066 [ FLUME-2066 ]
          Hide
          Phil Scala added a comment -

          My patch is probably pretty dated now, I will merge my changes into the latest trunk and create a patch over the next 24-48 hours

          Show
          Phil Scala added a comment - My patch is probably pretty dated now, I will merge my changes into the latest trunk and create a patch over the next 24-48 hours
          Hide
          Ted Malaska added a comment -

          Thanks Gopinathan,

          Phil Scala, let me know if you have a patch or if you want me to work on this.

          Show
          Ted Malaska added a comment - Thanks Gopinathan, Phil Scala, let me know if you have a patch or if you want me to work on this.
          Hide
          Gopinathan A added a comment -

          Phil Scala You can submit ur patch.

          Got similar issue FLUME-2160.

          Show
          Gopinathan A added a comment - Phil Scala You can submit ur patch. Got similar issue FLUME-2160 .
          Hide
          Phil Scala added a comment -

          I too see this issue at time, mostly from human error as someone wanted to spool a file and did not realize it was there already (in a completed state). I locally have a patch that I was working on to allow for this scenario, loosening the spooled file source policies around this. My current implemenation is a setting on the spooled directory source, called "useStrictSpooledFilePolicies". In places where a new IllegalStateException(message); was tehown, I log the error and check the setting value, thowing the exception when the setting is set to "true" (i.e. be strict).

          Though one must realize this will lead to duplicate events stored in the sink store and so this needs to be used with caution.

          I can submit my patch for this if other see value.

          For an immediate work around -> use the delete policy in 1.4 set to "IMMEDIATE". which would not save the .COMPLETED file. this of course means you do not have any .COMPLETED files to use for any proof of spooling.

          Show
          Phil Scala added a comment - I too see this issue at time, mostly from human error as someone wanted to spool a file and did not realize it was there already (in a completed state). I locally have a patch that I was working on to allow for this scenario, loosening the spooled file source policies around this. My current implemenation is a setting on the spooled directory source, called "useStrictSpooledFilePolicies". In places where a new IllegalStateException(message); was tehown, I log the error and check the setting value, thowing the exception when the setting is set to "true" (i.e. be strict). Though one must realize this will lead to duplicate events stored in the sink store and so this needs to be used with caution. I can submit my patch for this if other see value. For an immediate work around -> use the delete policy in 1.4 set to "IMMEDIATE". which would not save the .COMPLETED file. this of course means you do not have any .COMPLETED files to use for any proof of spooling.
          Jonathan Cooper-Ellis made changes -
          Field Original Value New Value
          Description If a spoolingdir receives FileA, after it is picked up by Flume and renamed to FileA.COMPLETED placing another file of the same original name (FileA) will cause Flume to log an IllegalStateException indefinitely. This is likely due to Flume attempting to rename the second FileA to FileA.COMPLETED, but finding that the file already exists.

          When Flume has entered this state, it can only be recovered by removing the .COMPLETED file from the directory and restarting the agent.
          If a spoolingdir receives FileA, after it is picked up by Flume and renamed to FileA.COMPLETED placing another file of the same original name (FileA) will cause Flume to log an IllegalStateException indefinitely. This is likely due to Flume attempting to rename the second FileA to FileA.COMPLETED, but finding that the file already exists.

          When Flume has entered this state, it can only be recovered by removing the .COMPLETED file from the directory and restarting the agent.

          Log message looks like this:

          02 Jul 2013 21:32:09,371 ERROR [pool-4-thread-1] (org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run:164) - Uncaught exception in Runnable
          java.lang.IllegalStateException: Serializer has been closed
                  at org.apache.flume.serialization.LineDeserializer.ensureOpen(LineDeserializer.java:124)
                  at org.apache.flume.serialization.LineDeserializer.readEvents(LineDeserializer.java:88)
                  at org.apache.flume.client.avro.ReliableSpoolingFileEventReader.readEvents(ReliableSpoolingFileEventReader.java:221)
                  at org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:154)
                  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
                  at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
                  at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
                  at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
                  at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
                  at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
                  at java.lang.Thread.run(Thread.java:662)
          Jonathan Cooper-Ellis created issue -

            People

            • Assignee:
              Phil Scala
              Reporter:
              Jonathan Cooper-Ellis
            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:

                Development