Flume
  1. Flume
  2. FLUME-1814

Problem with the default Locale in RegexExtractorInterceptorMillisSerializer

    Details

    • Type: Bug Bug
    • Status: Patch Available
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: v1.3.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      It is not possible in the version 1.3.0 of Flume to parse UK or US date from a French computer using the interceptor RegexExtractorInterceptorMillisSerializer.

      Indeed, the DateTimeFormatter created in the interceptor is currently using the default Locale which is FR on my computer. When I try to parse some files I got from US, I got the following exception:

      2012-12-31 17:09:13,370 (pool-5-thread-1) [ERROR - org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:148)] Uncaught exception in Runnable
      java.lang.IllegalArgumentException: Invalid format: "29/Dec/2012:05:09:34 -0700" is malformed at "Dec/2012:05:09:34 -0700"
              at org.joda.time.format.DateTimeFormatter.parseDateTime(DateTimeFormatter.java:866)
              at org.apache.flume.interceptor.RegexExtractorInterceptorMillisSerializer.serialize(RegexExtractorInterceptorMillisSerializer.java:48)
              at org.apache.flume.interceptor.RegexExtractorInterceptor.intercept(RegexExtractorInterceptor.java:147)
              at org.apache.flume.interceptor.RegexExtractorInterceptor.intercept(RegexExtractorInterceptor.java:158)
              at org.apache.flume.interceptor.InterceptorChain.intercept(InterceptorChain.java:62)
              at org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:146)
              at org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:143)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
              at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
              at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
              at java.lang.Thread.run(Thread.java:722)
      

      The solution I propose is to add a new property called "language" to the interceptor which will allow us to override the default Locale.

      1. flume-1814.patch
        5 kB
        Stéphane Moreau

        Activity

        Stéphane Moreau created issue -
        Stéphane Moreau made changes -
        Field Original Value New Value
        Description It is not possible in the version 1.3.0 of Flume to parse UK or US date from a French computer using the interceptor {{RegexExtractorInterceptorMillisSerializer}}.

        Indeed, the {{DateTimeFormatter}} created in the interceptor is currently using the default Locale which is FR on my computer. When I try to parse some file I got from US, I got the following exception:
        {code}
        2012-12-31 17:09:13,370 (pool-5-thread-1) [ERROR - org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:148)] Uncaught exception in Runnable
        java.lang.IllegalArgumentException: Invalid format: "29/Dec/2012:05:09:34 -0700" is malformed at "Dec/2012:05:09:34 -0700"
                at org.joda.time.format.DateTimeFormatter.parseDateTime(DateTimeFormatter.java:866)
                at org.apache.flume.interceptor.RegexExtractorInterceptorMillisSerializer.serialize(RegexExtractorInterceptorMillisSerializer.java:48)
                at org.apache.flume.interceptor.RegexExtractorInterceptor.intercept(RegexExtractorInterceptor.java:147)
                at org.apache.flume.interceptor.RegexExtractorInterceptor.intercept(RegexExtractorInterceptor.java:158)
                at org.apache.flume.interceptor.InterceptorChain.intercept(InterceptorChain.java:62)
                at org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:146)
                at org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:143)
                at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
                at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
                at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
                at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
                at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
                at java.lang.Thread.run(Thread.java:722)
        {code}

        The solution I propose is to add a new property called "language" to the interceptor which will allow us to override the default Locale.
        It is not possible in the version 1.3.0 of Flume to parse UK or US date from a French computer using the interceptor {{RegexExtractorInterceptorMillisSerializer}}.

        Indeed, the {{DateTimeFormatter}} created in the interceptor is currently using the default Locale which is FR on my computer. When I try to parse some files I got from US, I got the following exception:
        {code}
        2012-12-31 17:09:13,370 (pool-5-thread-1) [ERROR - org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:148)] Uncaught exception in Runnable
        java.lang.IllegalArgumentException: Invalid format: "29/Dec/2012:05:09:34 -0700" is malformed at "Dec/2012:05:09:34 -0700"
                at org.joda.time.format.DateTimeFormatter.parseDateTime(DateTimeFormatter.java:866)
                at org.apache.flume.interceptor.RegexExtractorInterceptorMillisSerializer.serialize(RegexExtractorInterceptorMillisSerializer.java:48)
                at org.apache.flume.interceptor.RegexExtractorInterceptor.intercept(RegexExtractorInterceptor.java:147)
                at org.apache.flume.interceptor.RegexExtractorInterceptor.intercept(RegexExtractorInterceptor.java:158)
                at org.apache.flume.interceptor.InterceptorChain.intercept(InterceptorChain.java:62)
                at org.apache.flume.channel.ChannelProcessor.processEventBatch(ChannelProcessor.java:146)
                at org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:143)
                at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
                at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
                at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
                at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
                at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
                at java.lang.Thread.run(Thread.java:722)
        {code}

        The solution I propose is to add a new property called "language" to the interceptor which will allow us to override the default Locale.
        Stéphane Moreau made changes -
        Attachment flume-1814.patch [ 12562804 ]
        Stéphane Moreau made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]

          People

          • Assignee:
            Unassigned
            Reporter:
            Stéphane Moreau
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:

              Development