Uploaded image for project: 'Flume'
  1. Flume
  2. FLUME-2052

Spooling directory source should be able to replace or ignore malformed characters

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.4.0
    • Fix Version/s: 1.5.0
    • Component/s: None
    • Environment:

      centOS 6.3
      Flume 1.3.0

      Description

      When parsing a file with messed up encoding flume spits this error:

      23 May 2013 22:06:29,446 ERROR [pool-12-thread-1] (org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run:164) - Uncaught exception in Runnable
      java.nio.charset.MalformedInputException: Input length = 1
      at java.nio.charset.CoderResult.throwException(CoderResult.java:277)
      at org.apache.flume.serialization.ResettableFileInputStream.readChar(ResettableFileInputStream.java:162)
      at org.apache.flume.serialization.LineDeserializer.readLine(LineDeserializer.java:134)
      at org.apache.flume.serialization.LineDeserializer.readEvent(LineDeserializer.java:72)
      at org.apache.flume.serialization.LineDeserializer.readEvents(LineDeserializer.java:91)
      at org.apache.flume.client.avro.ReliableSpoolingFileEventReader.readEvents(ReliableSpoolingFileEventReader.java:221)
      at org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:154)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
      at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
      at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      at java.lang.Thread.run(Thread.java:722)

      It would be good to skip such characters, ignore them or delete. Corrupt signs come from spamming engines, flume cant handle them at all.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                mpercy Mike Percy
                Reporter:
                greg.glazewski@cp.net greg glazeweas
              • Votes:
                3 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: