Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.8.0
-
None
-
Java version "1.8.0_151"
Java(TM) SE Runtime Environment (build 1.8.0_151-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.151-b12, mixed mode)
Hadoop 2.6.3
lzop native lib: hadoop-lzo-0.4.20-SNAPSHOT.jar
Description
a flume process configured with the following parameters cause this problem:
Configuration
spool_flume1.sources = spool-source-spool spool_flume1.channels = hdfs-channel-spool spool_flume1.sinks = hdfs-sink-spool spool_flume1.sources.spool-source-spool.type = spooldir spool_flume1.sources.spool-source-spool.channels = hdfs-channel-spool spool_flume1.sources.spool-source-spool.spoolDir = /home/test/flume_log spool_flume1.sources.spool-source-spool.recursiveDirectorySearch = true spool_flume1.sources.spool-source-spool.fileHeader = true spool_flume1.sources.spool-source-spool.deserializer = LINE spool_flume1.sources.spool-source-spool.deserializer.maxLineLength = 100000000 spool_flume1.sources.spool-source-spool.inputCharset = UTF-8 spool_flume1.sources.spool-source-spool.basenameHeader = true spool_flume1.channels.hdfs-channel-spool.type = memory spool_flume1.channels.hdfs-channel-spool.keep-alive = 60 spool_flume1.sinks.hdfs-sink-spool.channel = hdfs-channel-spool spool_flume1.sinks.hdfs-sink-spool.type = hdfs spool_flume1.sinks.hdfs-sink-spool.hdfs.writeFormat = Text spool_flume1.sinks.hdfs-sink-spool.hdfs.fileType = CompressedStream spool_flume1.sinks.hdfs-sink-spool.hdfs.codeC = lzop spool_flume1.sinks.hdfs-sink-spool.hdfs.threadsPoolSize = 1 spool_flume1.sinks.hdfs-sink-spool.hdfs.callTimeout = 100000 spool_flume1.sinks.hdfs-sink-spool.hdfs.idleTimeout = 36 spool_flume1.sinks.hdfs-sink-spool.hdfs.useLocalTimeStamp = true spool_flume1.sinks.hdfs-sink-spool.hdfs.filePrefix = %{basename} spool_flume1.sinks.hdfs-sink-spool.hdfs.path = /user/test/flume_test spool_flume1.sinks.hdfs-sink-spool.hdfs.rollCount = 0 spool_flume1.sinks.hdfs-sink-spool.hdfs.rollSize = 134217728 spool_flume1.sinks.hdfs-sink-spool.hdfs.rollInterval = 0 spool_flume1.sources.spool-source-spool.includePattern = log.*-1_2018.*$ spool_flume1.sources.spool-source-spool.batchSize = 100 spool_flume1.channels.hdfs-channel-spool.capacity = 1000 spool_flume1.channels.hdfs-channel-spool.transactionCapacity = 100
test data size add up to 4.2 G, amounts to 5271962 lines
expected data stored as lzop format and named files as %{basename}_%{LocalTimeStamp} on hdfs.
However, found sink data mixed in different files in my tests and total uploaded data amounts is less than local data
Test cases listed below:
- using DataStream, no matter set filePrefix = %{basename} or not, uploading normally
- using CompressedStream, hdfs.codec = lzop
- set filePrefix as default, uploading normally
- set filePrefix = %{basename}, data mixed and loss
when shut down my flume agent process, it`s weird that it prints correct amounts in flume.log but actually uploaded data is not that much. Log file attached in the end.