Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Won't Fix
-
0.9.4
-
None
-
RHEL 5
Description
We have
2 agent nodes
1 collector
1 master
8 flows
First all of those works fine and thrift servers are started for each flow:
grep "Starting blocking thread pool server on port" flume-flume-node-server6.log.2011-11-29
2011-11-29 13:21:24,562 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Starting blocking thread pool server on port 35855...
2011-11-29 13:21:54,572 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Starting blocking thread pool server on port 35857...
2011-11-29 13:22:24,581 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Starting blocking thread pool server on port 35853...
2011-11-29 13:22:54,589 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Starting blocking thread pool server on port 35858...
2011-11-29 13:23:24,597 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Starting blocking thread pool server on port 35860...
2011-11-29 13:23:54,607 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Starting blocking thread pool server on port 35859...
2011-11-29 13:24:24,615 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Starting blocking thread pool server on port 35854...
2011-11-29 13:24:54,625 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Starting blocking thread pool server on port 35856...
At some point after start two of those has stopped without any visible reason:
flume-flume-node-server6.log.2011-12-01:2011-12-01 17:27:15,523 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Closed server on port 35858...
flume-flume-node-server6.log.2011-12-05:2011-12-05 02:50:56,748 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Closed server on port 35857...
And thus stopping data flow from two of the sources.
Log collection was configured like this (for both aa and bb) using "flume
shell -c server5 -s flume-aa.txt":
cat flume-aa.txt
exec map server3 aa-agent-http-fe-1
exec map server3 aa-agent-http-fe-2
exec map server3 aa-agent-https-fe-1
exec map server3 aa-agent-https-fe-2
exec map server3 aa-agent-http-error-fe-1
exec map server3 aa-agent-http-error-fe-2
exec map server3 aa-agent-https-error-fe-1
exec map server3 aa-agent-https-error-fe-2
exec map server6 aa-collector-http-fe
exec map server6 aa-collector-https-fe
exec map server6 aa-collector-http-error-fe
exec map server6 aa-collector-https-error-fe
- HTTP
exec config aa-agent-http-fe-1 aa-flow-http-fe
'tailDir("/logs/aa/httpd-fe-1/", "aa_access_log-
d {4}-
d{2}-
d{2}$",
true)' autoE2EChain
exec config aa-agent-http-fe-2 aa-flow-http-fe
'tailDir("/logs/aa/httpd-fe-2/", "aa_access_log-
d{4}-
{2}-
d
d{2}$",
true)' autoE2EChain
exec config aa-collector-http-fe aa-flow-http-fe autoCollectorSource
'collectorSink("hdfs://hfds-server:8020/flume/aa/httpd-fe/%Y-%m-%d/",
"%
# HTTPS
exec config aa-agent-https-fe-1 aa-flow-https-fe
'tailDir("/logs/aa/httpd-fe-1/", "ssl-aa_access_log-\\d{4}-
d{2}-
d{2}$",
true)' autoE2EChain
exec config aa-agent-https-fe-2 aa-flow-https-fe
'tailDir("/logs/aa/httpd-fe-2/", "ssl-aa_access_log-
d{4}-
d{2}-
d{2}$",
true)' autoE2EChain
exec config aa-collector-https-fe aa-flow-https-fe autoCollectorSource
'collectorSink("hdfs://hdfs-server:8020/flume/aa/httpd-fe/%Y-%m-%d/",
"%{host}
ssl-access")'
- HTTP ERROR
exec config aa-agent-http-error-fe-1 aa-flow-http-error-fe
'tailDir("/logs/aa/httpd-fe-1/", "aa_error_log-
d {4}-
d{2}-
d{2}$",
true)' autoE2EChain
exec config aa-agent-http-error-fe-2 aa-flow-http-error-fe
'tailDir("/logs/aa/httpd-fe-2/", "aa_error_log-
d{4}-
{2}-
d
d{2}$",
{host}
true)' autoE2EChain
exec config aa-collector-http-error-fe aa-flow-http-error-fe
autoCollectorSource
'collectorSink("hdfs://hdfs-server:8020/flume/aa/httpd-fe/%Y-%m-%d/",
"%error")'
# HTTPS ERROR
exec config aa-agent-https-error-fe-1 aa-flow-https-error-fe
'tailDir("/logs/aa/httpd-fe-1/", "ssl-aa_error_log-\\d{4}-
d{2}-
d{2}$",
true)' autoE2EChain
exec config aa-agent-https-error-fe-2 aa-flow-https-error-fe
'tailDir("/logs/aa/httpd-fe-2/", "ssl-aa_error_log-
d{4}-
d{2}-
d{2}$",
true)' autoE2EChain
exec config aa-collector-https-error-fe aa-flow-https-error-fe
autoCollectorSource
'collectorSink("hdfs://hdfs-server:8020/flume/aa/httpd-fe/%Y-%m-%d/",
"%{host}ssl-error")'
waitForNodesActive 0 aa-agent-http-fe-1 aa-agent-http-fe-2
aa-agent-https-fe-1 aa-agent-https-fe-2 aa-agent-http-error-fe-1
aa-agent-http-error-fe-2 aa-agent-https-error-fe-1
aa-agent-https-error-fe-2 aa-collector-http-fe aa-collector-https-fe
aa-collector-http-error-fe aa-collector-https-error-fe
exec refreshAll
A bit more info from email thread:
http://mail-archives.apache.org/mod_mbox/incubator-flume-user/201111.mbox/%3CCAOwicogsR_PZ3fY4TnRGOWwLwmjATJZ5LkMLTKrAbc74Ce5PKw@mail.gmail.com%3E