Uploaded image for project: 'Flume'
  1. Flume
  2. FLUME-2052

Spooling directory source should be able to replace or ignore malformed characters

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: v1.4.0
    • Fix Version/s: v1.5.0
    • Component/s: None
    • Environment:

      centOS 6.3
      Flume 1.3.0

      Description

      When parsing a file with messed up encoding flume spits this error:

      23 May 2013 22:06:29,446 ERROR [pool-12-thread-1] (org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run:164) - Uncaught exception in Runnable
      java.nio.charset.MalformedInputException: Input length = 1
      at java.nio.charset.CoderResult.throwException(CoderResult.java:277)
      at org.apache.flume.serialization.ResettableFileInputStream.readChar(ResettableFileInputStream.java:162)
      at org.apache.flume.serialization.LineDeserializer.readLine(LineDeserializer.java:134)
      at org.apache.flume.serialization.LineDeserializer.readEvent(LineDeserializer.java:72)
      at org.apache.flume.serialization.LineDeserializer.readEvents(LineDeserializer.java:91)
      at org.apache.flume.client.avro.ReliableSpoolingFileEventReader.readEvents(ReliableSpoolingFileEventReader.java:221)
      at org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:154)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
      at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
      at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
      at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      at java.lang.Thread.run(Thread.java:722)

      It would be good to skip such characters, ignore them or delete. Corrupt signs come from spamming engines, flume cant handle them at all.

        Issue Links

          Activity

          Hide
          jay.liu Jay Liu added a comment -

          This is a major show stopper for us. I agree that it should skip such characters.

          Show
          jay.liu Jay Liu added a comment - This is a major show stopper for us. I agree that it should skip such characters.
          Hide
          mpercy Mike Percy added a comment -

          If anyone has time to submit a patch for this I'm willing to review and try to pull it into Flume 1.4.

          Show
          mpercy Mike Percy added a comment - If anyone has time to submit a patch for this I'm willing to review and try to pull it into Flume 1.4.
          Hide
          jaehong.choi Jaehong Choi added a comment -

          This happened to me as well. It would be nice to add a configuration to decide on MalformedInput exception.

          Show
          jaehong.choi Jaehong Choi added a comment - This happened to me as well. It would be nice to add a configuration to decide on MalformedInput exception.
          Hide
          mpercy Mike Percy added a comment -

          Attaching patch to address this issue.

          Show
          mpercy Mike Percy added a comment - Attaching patch to address this issue.
          Hide
          hshreedharan Hari Shreedharan added a comment -

          +1. Looks good. Committing

          Show
          hshreedharan Hari Shreedharan added a comment - +1. Looks good. Committing
          Hide
          hshreedharan Hari Shreedharan added a comment -

          Committed, rev: b84d016. Thanks Mike!

          Show
          hshreedharan Hari Shreedharan added a comment - Committed, rev: b84d016. Thanks Mike!
          Hide
          hudson Hudson added a comment -

          SUCCESS: Integrated in flume-trunk #506 (See https://builds.apache.org/job/flume-trunk/506/)
          FLUME-2052. Spooling directory source should be able to replace or ignore malformed characters (hshreedharan: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=b84d01615a47c8152cfa1119a52a1a1f1b445843)

          • flume-ng-doc/sphinx/FlumeUserGuide.rst
          • flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySourceConfigurationConstants.java
          • flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySource.java
          • flume-ng-core/src/main/java/org/apache/flume/serialization/ResettableFileInputStream.java
          • flume-ng-core/src/main/java/org/apache/flume/client/avro/ReliableSpoolingFileEventReader.java
          • flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableFileInputStream.java
          • flume-ng-core/src/main/java/org/apache/flume/serialization/DecodeErrorPolicy.java
          Show
          hudson Hudson added a comment - SUCCESS: Integrated in flume-trunk #506 (See https://builds.apache.org/job/flume-trunk/506/ ) FLUME-2052 . Spooling directory source should be able to replace or ignore malformed characters (hshreedharan: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=b84d01615a47c8152cfa1119a52a1a1f1b445843 ) flume-ng-doc/sphinx/FlumeUserGuide.rst flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySourceConfigurationConstants.java flume-ng-core/src/main/java/org/apache/flume/source/SpoolDirectorySource.java flume-ng-core/src/main/java/org/apache/flume/serialization/ResettableFileInputStream.java flume-ng-core/src/main/java/org/apache/flume/client/avro/ReliableSpoolingFileEventReader.java flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableFileInputStream.java flume-ng-core/src/main/java/org/apache/flume/serialization/DecodeErrorPolicy.java

            People

            • Assignee:
              mpercy Mike Percy
              Reporter:
              greg.glazewski@cp.net greg glazeweas
            • Votes:
              3 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development