[FLUME-620] Collector fails due to an infinite loop during file-rolling if the body size is greater than 30kb - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 0.9.3
Fix Version/s: 0.9.5
Component/s: Node
Labels:
- flume
- rolling
- sink
- thrift
Environment:

Red Hat yum install (Amazon's new AMI)

Description

Using this config:
logs-2 default-flow collectorSource(9109) collectorSink("file:///data/log/flumed/%

{logtype}

/dt=%Y-%m-%d", "%k-json", 900000)
agent1 default-flow rpcSource(9108) agentBESink("logs-2", 9109)

(assuming a max body size of 30kb)

If an event is passed through agent1 => logs-2 where the body is larger than 30kb then during file roll logs-2 will SILENTLY fail with repeated exception messages in the log files (DEBUG severity)

[see attached log snippet]
The only log entry of higher importance than INFO is this one:
WARN com.cloudera.flume.handlers.rolling.RollSink: TriggerThread interrupted
which is totally useless for debugging

Other information / speculation:
1) The same thing happens on agent1 during the agentSink if you surround the agentSink with collector(15000){}. When it rolls the connection the same exception and behavior occur. Basically it happens whenever there is rolling.
2) I'm not sure if any event larger than 30kb triggers this, or whether it has to receive the event at a particular time relative to the file roll.
3) I have no idea why this happens at roll-time and not during regular event collection.
4) I don't know whether this is directly related to the use of the rpcSink. I know the internal communication mechanisms share code with the sink so perhaps the regular event checks do not happen?
5) using the flume shell, running getnodestatus will tell you that all nodes are active, despite this problem.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

ASF.LICENSE.NOT.GRANTED--0001-FLUME-620-Collector-fails-due-to-an-infinite-loop-du.patch
27/Jun/11 06:18
57 kB
Jonathan Hsieh
ASF.LICENSE.NOT.GRANTED--0001-FLUME-620-Thrift-event-body-truncations.patch
15/Jun/11 17:37
39 kB
Jonathan Hsieh
ASF.LICENSE.NOT.GRANTED--log_snippet.txt
29/Apr/11 18:50
7 kB
Disabled imported user

Issue Links

is related to

FLUME-375 Unexpected error causes collector to exit "normally"

Resolved

FLUME-658 Handle Unexpected RuntimeExceptions differently than IOExceptions in sinks and decorators.

Resolved

Activity

People

Assignee:: Jonathan Hsieh

Reporter:: Disabled imported user

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 29/Apr/11 18:50

Updated:: 27/Jun/11 06:18

Resolved:: 27/Jun/11 06:18