Uploaded image for project: 'Flume'
  1. Flume
  2. FLUME-1233

HDFS Sink has problem with %c escape sequence in bucket path

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.2.0
    • None
    • Sinks+Sources
    • None
    • CentOS 5.6 64-bit

    Description

      Steps:

      1) Create a flume.conf file that specifies an bucket path with an escape sequence. Here's a partial config file:

      agent.sinks.k1.channel = c1
      agent.sinks.k1.type = HDFS
      #agent.sinks.k1.hdfs.round = true
      #agent.sinks.k1.hdfs.roundUnit = minute
      #agent.sinks.k1.hdfs.roundValue = 2
      agent.sinks.k1.hdfs.path = hdfs://blah.example.com/blah-test-ch01-%

      {host}

      agent.sinks.k1.hdfs.fileType = DataStream
      agent.sinks.k1.hdfs.rollInterval = 0
      agent.sinks.k1.hdfs.rollSize = 0
      agent.sinks.k1.hdfs.rollCount = 0
      agent.sinks.k1.hdfs.batchSize = 1000
      agent.sinks.k1.hdfs.txnEventMax = 1000

      2) Try to send an event that has a timestamp in its header (HINT: you can use an interceptor to add a timestamp to the header of all events generated by SequenceGeneratorSource)

      You'll see ERROR (exceptions) in the log.

      2012-05-29 09:35:20,343 INFO hdfs.BucketWriter: Creating hdfs://blah.example.com/blah-test-ch08-Tue May 29 09:35:18 2012/FlumeData.1274356498034827.tmp
      2012-05-29 09:35:20,359 ERROR hdfs.HDFSEventSink: process failed
      java.lang.IllegalArgumentException: Pathname /blah-test-ch08-Tue May 29 09:35:18 2012/FlumeData.1274356498034827.tmp from hdfs://blah.example.com/blah-test-ch08-Tue May 29 09:35:18 2012/FlumeData.1274356498034827.tmp is not a valid DFS filename.
      at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:165)
      at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:219)
      at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:584)
      at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:565)
      at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:472)
      at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:464)
      at org.apache.flume.sink.hdfs.HDFSDataStream.open(HDFSDataStream.java:60)
      at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:121)
      at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:179)
      at org.apache.flume.sink.hdfs.HDFSEventSink$1.doCall(HDFSEventSink.java:432)
      at org.apache.flume.sink.hdfs.HDFSEventSink$1.doCall(HDFSEventSink.java:429)
      at org.apache.flume.sink.hdfs.HDFSEventSink$ProxyCallable.call(HDFSEventSink.java:164)
      at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
      at java.util.concurrent.FutureTask.run(FutureTask.java:138)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      at java.lang.Thread.run(Thread.java:662)
      2012-05-29 09:35:20,361 ERROR flume.SinkRunner: Unable to deliver event. Exception follows.
      org.apache.flume.EventDeliveryException: java.lang.IllegalArgumentException: Pathname /blah-test-ch08-Tue May 29 09:35:18 2012/FlumeData.1274356498034827.tmp from hdfs://blah.example.com/blah-test-ch08-Tue May 29 09:35:18 2012/FlumeData.1274356498034827.tmp is not a valid DFS filename.
      at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:469)
      at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
      at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
      at java.lang.Thread.run(Thread.java:662)
      Caused by: java.lang.IllegalArgumentException: Pathname /blah-test-ch08-Tue May 29 09:35:18 2012/FlumeData.1274356498034827.tmp from hdfs://blah.example.com/blah-test-ch08-Tue May 29 09:35:18 2012/FlumeData.1274356498034827.tmp is not a valid DFS filename.
      at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:165)
      at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:219)
      at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:584)
      at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:565)
      at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:472)
      at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:464)
      at org.apache.flume.sink.hdfs.HDFSDataStream.open(HDFSDataStream.java:60)
      at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:121)
      at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:179)
      at org.apache.flume.sink.hdfs.HDFSEventSink$1.doCall(HDFSEventSink.java:432)
      at org.apache.flume.sink.hdfs.HDFSEventSink$1.doCall(HDFSEventSink.java:429)
      at org.apache.flume.sink.hdfs.HDFSEventSink$ProxyCallable.call(HDFSEventSink.java:164)
      at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
      at java.util.concurrent.FutureTask.run(FutureTask.java:138)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      ... 1 more

      NOTE: According to the docs, %c is "locale's date and time", and the example it give is "Thu Mar 3 23:05:25 2005".

      Attachments

        Activity

          People

            Unassigned Unassigned
            will@cloudera.com Will McQueen
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: