Details
Description
The collector can't start correctly if NameNode is down. The collector goes to ERROR state, but it can be recovered by 'refresh' command.
There are a sample configs:
exec config azotov-ws 'text("/tmp/test.log")' 'agentE2ESink("localhost",35867)'
exec config idea-collector 'collectorSource(35867)' 'collector( 10000 ) {escapedFormatDfs("hdfs://localhost:8020/tmp/test3/", "flume-%
There are a logs:
11/08/30 15:07:41 INFO conf.FlumeConfiguration: Loading configurations from /etc/flume/conf
11/08/30 15:07:41 WARN agent.FlumeNode: Log directory is writing inside of /tmp. This data may not survive reboot!
11/08/30 15:07:41 WARN text.FormatFactory: Unable to load output format plugin class - Class not found
11/08/30 15:07:41 INFO mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
11/08/30 15:07:41 INFO util.InternalHttpServer: Starting internal HTTP server
11/08/30 15:07:41 INFO mortbay.log: jetty-6.1.26
11/08/30 15:07:41 INFO mortbay.log: Extract /usr/lib/flume/webapps/flumeagent.war to /tmp/Jetty_0_0_0_0_35862_flumeagent.war_flumeagent_4cgrz3/webapp
11/08/30 15:07:42 INFO mortbay.log: Started SelectChannelConnector@0.0.0.0:35862
11/08/30 15:07:42 INFO util.InternalHttpServer: Server started on port 35862
11/08/30 15:07:42 INFO agent.LogicalNodeManager: creating new logical node idea-collector
11/08/30 15:07:42 INFO agent.MultiMasterRPC: No active master RPC connection
11/08/30 15:07:42 INFO agent.LogicalNodeManager: Loading node name with FlumeConfigData: {srcVer:'Thu Jan 01 03:00:00 MSK 1970' snkVer:'Thu Jan 01 03:00:00 MSK 1970' ts='Thu Jan 01 03:00:00 MSK 1970' flowId:'null' source:'null' sink:'null' }
11/08/30 15:07:42 INFO agent.ThriftMasterRPC: Connected to master at localhost:35872
11/08/30 15:07:42 INFO agent.LogicalNode: Node config successfully set to FlumeConfigData: {srcVer:'Thu Jan 01 03:00:00 MSK 1970' snkVer:'Thu Jan 01 03:00:00 MSK 1970' ts='Thu Jan 01 03:00:00 MSK 1970' flowId:'null' source:'null' sink:'null' }
11/08/30 15:07:42 INFO agent.FlumeNode: Hadoop Security enabled: false
11/08/30 15:07:47 INFO rolling.RollSink: Created RollSink: trigger=[TimeTrigger: maxAge=10000 tagger=com.cloudera.flume.handlers.rolling.ProcessTagger@317cfd38] checkPeriodMs = 250 spec='escapedFormatDfs( "hdfs://localhost:8020/tmp/test3/", "flume-%{rolltag}
.log" )'
11/08/30 15:07:47 INFO collector.CollectorSource: opened
11/08/30 15:07:47 INFO agent.LogicalNode: Node config successfully set to FlumeConfigData: {srcVer:'Tue Aug 30 15:05:48 MSD 2011' snkVer:'Tue Aug 30 15:05:48 MSD 2011' ts='Tue Aug 30 15:05:48 MSD 2011' flowId:'default-flow' source:'collectorSource( 35867 )' sink:'collector( 10000 ) { escapedFormatDfs( "hdfs://localhost:8020/tmp/test3/", "flume-%
11/08/30 15:07:47 INFO thrift.ThriftEventSource: Starting blocking thread pool server on port 35867...
11/08/30 15:07:47 INFO rolling.RollSink: opening RollSink 'escapedFormatDfs( "hdfs://localhost:8020/tmp/test3/", "flume-%{rolltag}
.log" )'
11/08/30 15:07:47 INFO debug.InsistentOpenDecorator: Opened MaskDecorator on try 0
11/08/30 15:08:23 INFO endtoend.AckChecksumChecker: Starting checksum group called 20110830-145747661+0400.678151906732325.00000027
11/08/30 15:08:23 INFO endtoend.AckChecksumChecker: initial checksum is 1321a567a8e
11/08/30 15:08:23 INFO hdfs.EscapedCustomDfsSink: Opening hdfs://localhost:8020/tmp/test3/flume-20110830-150817116+0400.678781361271117.00000022.log
11/08/30 15:08:25 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 0 time(s).
11/08/30 15:08:26 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 1 time(s).
11/08/30 15:08:27 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 2 time(s).
11/08/30 15:08:28 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 3 time(s).
11/08/30 15:08:28 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 4 time(s).
11/08/30 15:08:28 INFO debug.StubbornAppendSink: append Interrupted event 'azotov-ws [INFO Tue Aug 30 14:57:47 MSD 2011]
{ AckTag : 20110830-145747661+0400.678151906732325.00000027 }
{ AckType : msg }
2011-08-15 07:01:58\t2006\t153897273691618\t00008C52000000006A6B2F\twww.ancestry.com/\t228\tNULL\t4100\t10162981=50|10163002=50|10163004=50|10162970=50\tNULL\t2\tNULL\tUS\t217\tCOMPUTER\tWINDOWS_7\tCHROME\t-1120233552\t1141025753' with error: Blocked append interrupted by rotation event
11/08/30 15:08:28 INFO rolling.RollSink: closing RollSink 'escapedFormatDfs( "hdfs://localhost:8020/tmp/test3/", "flume-%
11/08/30 15:08:28 ERROR connector.DirectDriver: Closing down due to exception during append calls
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1223)
at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:976)
at com.cloudera.flume.handlers.rolling.RollSink.close(RollSink.java:296)
at com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
at com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
at com.cloudera.flume.handlers.debug.InsistentOpenDecorator.close(InsistentOpenDecorator.java:175)
at com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
at com.cloudera.flume.handlers.debug.StubbornAppendSink.append(StubbornAppendSink.java:78)
at com.cloudera.flume.core.EventSinkDecorator.append(EventSinkDecorator.java:60)
at com.cloudera.flume.handlers.debug.InsistentAppendDecorator.append(InsistentAppendDecorator.java:110)
at com.cloudera.flume.core.EventSinkDecorator.append(EventSinkDecorator.java:60)
at com.cloudera.flume.handlers.endtoend.AckChecksumChecker.append(AckChecksumChecker.java:172)
at com.cloudera.flume.core.EventSinkDecorator.append(EventSinkDecorator.java:60)
at com.cloudera.flume.handlers.batch.UnbatchingDecorator.append(UnbatchingDecorator.java:62)
at com.cloudera.flume.core.EventSinkDecorator.append(EventSinkDecorator.java:60)
at com.cloudera.flume.handlers.batch.GunzipDecorator.append(GunzipDecorator.java:81)
at com.cloudera.flume.collector.CollectorSink.append(CollectorSink.java:222)
at com.cloudera.flume.core.connector.DirectDriver$PumperThread.run(DirectDriver.java:110)
11/08/30 15:08:28 INFO connector.DirectDriver: Connector logicalNode idea-collector-20 exited with error: null
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1223)
at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:976)
at com.cloudera.flume.handlers.rolling.RollSink.close(RollSink.java:296)
at com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
at com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
at com.cloudera.flume.handlers.debug.InsistentOpenDecorator.close(InsistentOpenDecorator.java:175)
at com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
at com.cloudera.flume.handlers.debug.StubbornAppendSink.append(StubbornAppendSink.java:78)
at com.cloudera.flume.core.EventSinkDecorator.append(EventSinkDecorator.java:60)
at com.cloudera.flume.handlers.debug.InsistentAppendDecorator.append(InsistentAppendDecorator.java:110)
at com.cloudera.flume.core.EventSinkDecorator.append(EventSinkDecorator.java:60)
at com.cloudera.flume.handlers.endtoend.AckChecksumChecker.append(AckChecksumChecker.java:172)
at com.cloudera.flume.core.EventSinkDecorator.append(EventSinkDecorator.java:60)
at com.cloudera.flume.handlers.batch.UnbatchingDecorator.append(UnbatchingDecorator.java:62)
at com.cloudera.flume.core.EventSinkDecorator.append(EventSinkDecorator.java:60)
at com.cloudera.flume.handlers.batch.GunzipDecorator.append(GunzipDecorator.java:81)
at com.cloudera.flume.collector.CollectorSink.append(CollectorSink.java:222)
at com.cloudera.flume.core.connector.DirectDriver$PumperThread.run(DirectDriver.java:110)
11/08/30 15:08:28 INFO collector.CollectorSource: closed
11/08/30 15:08:29 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 5 time(s).
11/08/30 15:08:29 INFO thrift.ThriftEventSource: Closed server on port 35867...
11/08/30 15:08:29 INFO thrift.ThriftEventSource: Queue still has 1000 elements ...
11/08/30 15:08:30 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 6 time(s).
11/08/30 15:08:31 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 7 time(s).
11/08/30 15:08:32 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 8 time(s).
11/08/30 15:08:33 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 9 time(s).
11/08/30 15:08:39 WARN thrift.ThriftEventSource: Close timed out due to no progress. Closing despite having 1000 values still enqueued
11/08/30 15:08:39 INFO rolling.RollSink: closing RollSink 'escapedFormatDfs( "hdfs://localhost:8020/tmp/test3/", "flume-%{rolltag}
.log" )'
11/08/30 15:08:39 WARN endtoend.AckChecksumChecker: partial acks abandoned:
11/08/30 15:08:39 ERROR connector.DirectDriver: Exiting driver logicalNode idea-collector-20 in error state CollectorSource | Collector because null