Uploaded image for project: 'Flume'
  1. Flume
  2. FLUME-3106

When batchSize of sink greater than transactionCapacity of Memory Channel, Flume can produce endless data

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Resolved
    • Affects Version/s: 1.7.0
    • Fix Version/s: 1.9.0
    • Component/s: Channel
    • Labels:
      None

      Description

      Flume can produce endless data when use this following config:

      agent.sources = src1
      agent.sinks = sink1
      agent.channels = ch2
      
      agent.sources.src1.type = spooldir
      agent.sources.src1.channels = ch2
      agent.sources.src1.spoolDir = /home/kafka/flumeSpooldir
      agent.sources.src1.fileHeader = false
      agent.sources.src1.batchSize = 5
      
      agent.channels.ch2.type=memory
      agent.channels.ch2.capacity=100
      agent.channels.ch2.transactionCapacity=5
      
      agent.sinks.sink1.type = hdfs
      agent.sinks.sink1.channel = ch2
      agent.sinks.sink1.hdfs.path = hdfs://kafka1:9000/flume/
      agent.sinks.sink1.hdfs.rollInterval=1
      agent.sinks.sink1.hdfs.fileType = DataStream
      agent.sinks.sink1.hdfs.writeFormat = Text
      agent.sinks.sink1.hdfs.batchSize = 10
      

      And there are Exceptions like this:

      org.apache.flume.ChannelException: Take list for MemoryTransaction, capacity 5 full, consider committing more frequently, increasing capaci
      ty, or increasing thread count
              at org.apache.flume.channel.MemoryChannel$MemoryTransaction.doTake(MemoryChannel.java:99)
              at org.apache.flume.channel.BasicTransactionSemantics.take(BasicTransactionSemantics.java:113)
              at org.apache.flume.channel.BasicChannelSemantics.take(BasicChannelSemantics.java:95)
              at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:362)
              at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67)
              at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145)
              at java.lang.Thread.run(Thread.java:745)
      17/06/09 09:48:04 ERROR flume.SinkRunner: Unable to deliver event. Exception follows.
      org.apache.flume.EventDeliveryException: org.apache.flume.ChannelException: Take list for MemoryTransaction, capacity 5 full, consider comm
      itting more frequently, increasing capacity, or increasing thread count
              at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:451)
              at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67)
              at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145)
              at java.lang.Thread.run(Thread.java:745)
      

      When takeList of Memory Channel is full,there is a ChannelException will be throwed,The event of takeList has been writed by the sink and roll back to the queue of memoryChannel at the same time,it is not reasonable.

        Attachments

        1. FLUME-3106-0.patch
          1.0 kB
          Yongxi Zhang

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                xyz2277 Yongxi Zhang
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: