Details
Description
After a certain time I received this error, nothing was written to the log until I restart the collector:
2012-02-24 16:57:41,553 ERROR com.cloudera.flume.core.connector.DirectDriver: Exiting driver logicalNode hdflume01.lamp.com-19 in error state CollectorSource | Collector because null
My problem is the collector server is still marked as active and all the agents wont failover to the next collector, because the collector itself is still up.
My configurations is the following:
collectorSource collectorSink("hdfs://hadoop01.lamp.com:54310/logs/%Y/%m/%d/%H/", "-")
collectorSource collectorSink("hdfs://hadoop01.lamp.com:54310/logs/%Y/%m/%d/%H/", "-")
collectorSource collectorSink("hdfs://hadoop01.lamp.com:54310/logs/%Y/%m/%d/%H/", "-")
tail("/var/log/nginx/localhost.access.log") agentE2EChain("hdflume01.lamp.com", "hdflume03.lamp.com")
tail("/var/log/nginx/localhost.access.log") agentE2EChain("hdflume01.lamp.com", "hdflume03.lamp.com")
I think was able to reproduce this error with two virtual machine when I was changing the same configuration of the collector to:
collectorSource console
2012-02-25 13:46:17,711 INFO com.cloudera.flume.collector.CollectorSource: closed
2012-02-25 13:46:18,712 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Closed server on port 35853...
2012-02-25 13:46:18,713 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Queue still has 1 elements ...
2012-02-25 13:46:19,942 WARN com.cloudera.flume.agent.LivenessManager: Heartbeats are backing up, currently behind by 1 heartbeats
2012-02-25 13:46:24,944 WARN com.cloudera.flume.agent.LivenessManager: Heartbeats are backing up, currently behind by 2 heartbeats
2012-02-25 13:46:29,946 WARN com.cloudera.flume.agent.LivenessManager: Heartbeats are backing up, currently behind by 3 heartbeats
2012-02-25 13:46:34,949 WARN com.cloudera.flume.agent.LivenessManager: Heartbeats are backing up, currently behind by 4 heartbeats
2012-02-25 13:46:39,941 ERROR com.cloudera.flume.agent.LogicalNode: Forcing driver to exit uncleanly
2012-02-25 13:46:39,942 ERROR com.cloudera.flume.handlers.thrift.ThriftEventSource: Unexpected interrupt of close sleep interrupted
java.lang.InterruptedException: sleep interrupted
at java.lang.Thread.sleep(Native Method)
at com.cloudera.flume.handlers.thrift.ThriftEventSource.close(ThriftEventSource.java:191)
at com.cloudera.flume.collector.CollectorSource.close(CollectorSource.java:61)
at com.cloudera.flume.core.connector.DirectDriver$PumperThread.run(DirectDriver.java:166)
2012-02-25 13:46:39,943 ERROR com.cloudera.flume.core.connector.DirectDriver: Closing down due to exception during close calls
2012-02-25 13:46:39,943 INFO com.cloudera.flume.core.connector.DirectDriver: Connector logicalNode gfiles02.lamp.com-18 exited with error: sleep interrupted
java.lang.InterruptedException: sleep interrupted
at java.lang.Thread.sleep(Native Method)
at com.cloudera.flume.handlers.thrift.ThriftEventSource.close(ThriftEventSource.java:191)
at com.cloudera.flume.collector.CollectorSource.close(CollectorSource.java:61)
at com.cloudera.flume.core.connector.DirectDriver$PumperThread.run(DirectDriver.java:166)
2012-02-25 13:46:39,944 INFO com.cloudera.flume.collector.CollectorSource: closed
2012-02-25 13:46:39,944 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Closed server on port 35853...
2012-02-25 13:46:39,944 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Queue still has 2 elements ...
2012-02-25 13:46:39,944 INFO com.cloudera.flume.agent.LogicalNode: Node config successfully set to FlumeConfigData:
2012-02-25 13:46:39,957 INFO com.cloudera.flume.collector.CollectorSource: opened
2012-02-25 13:46:39,988 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Starting blocking thread pool server on port 35853...
2012-02-25 13:46:39,997 INFO com.cloudera.flume.handlers.debug.ConsoleEventSink: ConsoleEventSink( debug ) opened
2012-02-25 13:46:49,978 WARN com.cloudera.flume.handlers.thrift.ThriftEventSource: Close timed out due to no progress. Closing despite having 2 values still enqueued
2012-02-25 13:46:49,978 INFO com.cloudera.flume.handlers.rolling.RollSink: closing RollSink 'escapedCustomDfs("hdfs://hh01.hadoop.com:54310/logs/%Y/%m/%d/%H/","-%
" )'
2012-02-25 13:46:49,979 WARN com.cloudera.flume.handlers.endtoend.AckChecksumChecker: partial acks abandoned:
2012-02-25 13:46:49,979 ERROR com.cloudera.flume.core.connector.DirectDriver: Exiting driver logicalNode gfiles02.lamp.com-18 in error state CollectorSource | Collector because sleep interrupted
In both case the error state that the node is in error state but they really are in an active state.