Uploaded image for project: 'Flume'
  1. Flume
  2. FLUME-754

Collector can't start correctly if NameNode is down for E2E reliability mode.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Won't Fix
    • 0.9.4
    • 0.9.5
    • Node
    • Centos 5.5, cdh3u1, java 1.6.0_20.
      Ubuntu 10.04 x64, cdh3u1, pseudo-distributed mode, java 1.6.0_24.

    Description

      The collector can't start correctly if NameNode is down. The collector goes to ERROR state, but it can be recovered by 'refresh' command.

      There are a sample configs:
      exec config azotov-ws 'text("/tmp/test.log")' 'agentE2ESink("localhost",35867)'
      exec config idea-collector 'collectorSource(35867)' 'collector( 10000 ) {escapedFormatDfs("hdfs://localhost:8020/tmp/test3/", "flume-%

      {rolltag}.log")}'

      There are a logs:
      11/08/30 15:07:41 INFO conf.FlumeConfiguration: Loading configurations from /etc/flume/conf
      11/08/30 15:07:41 WARN agent.FlumeNode: Log directory is writing inside of /tmp. This data may not survive reboot!
      11/08/30 15:07:41 WARN text.FormatFactory: Unable to load output format plugin class - Class not found
      11/08/30 15:07:41 INFO mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
      11/08/30 15:07:41 INFO util.InternalHttpServer: Starting internal HTTP server
      11/08/30 15:07:41 INFO mortbay.log: jetty-6.1.26
      11/08/30 15:07:41 INFO mortbay.log: Extract /usr/lib/flume/webapps/flumeagent.war to /tmp/Jetty_0_0_0_0_35862_flumeagent.war_flumeagent_4cgrz3/webapp
      11/08/30 15:07:42 INFO mortbay.log: Started SelectChannelConnector@0.0.0.0:35862
      11/08/30 15:07:42 INFO util.InternalHttpServer: Server started on port 35862
      11/08/30 15:07:42 INFO agent.LogicalNodeManager: creating new logical node idea-collector
      11/08/30 15:07:42 INFO agent.MultiMasterRPC: No active master RPC connection
      11/08/30 15:07:42 INFO agent.LogicalNodeManager: Loading node name with FlumeConfigData: {srcVer:'Thu Jan 01 03:00:00 MSK 1970' snkVer:'Thu Jan 01 03:00:00 MSK 1970' ts='Thu Jan 01 03:00:00 MSK 1970' flowId:'null' source:'null' sink:'null' }
      11/08/30 15:07:42 INFO agent.ThriftMasterRPC: Connected to master at localhost:35872
      11/08/30 15:07:42 INFO agent.LogicalNode: Node config successfully set to FlumeConfigData: {srcVer:'Thu Jan 01 03:00:00 MSK 1970' snkVer:'Thu Jan 01 03:00:00 MSK 1970' ts='Thu Jan 01 03:00:00 MSK 1970' flowId:'null' source:'null' sink:'null' }
      11/08/30 15:07:42 INFO agent.FlumeNode: Hadoop Security enabled: false
      11/08/30 15:07:47 INFO rolling.RollSink: Created RollSink: trigger=[TimeTrigger: maxAge=10000 tagger=com.cloudera.flume.handlers.rolling.ProcessTagger@317cfd38] checkPeriodMs = 250 spec='escapedFormatDfs( "hdfs://localhost:8020/tmp/test3/", "flume-%{rolltag}

      .log" )'
      11/08/30 15:07:47 INFO collector.CollectorSource: opened
      11/08/30 15:07:47 INFO agent.LogicalNode: Node config successfully set to FlumeConfigData: {srcVer:'Tue Aug 30 15:05:48 MSD 2011' snkVer:'Tue Aug 30 15:05:48 MSD 2011' ts='Tue Aug 30 15:05:48 MSD 2011' flowId:'default-flow' source:'collectorSource( 35867 )' sink:'collector( 10000 ) { escapedFormatDfs( "hdfs://localhost:8020/tmp/test3/", "flume-%

      {rolltag}.log" ) }' }
      11/08/30 15:07:47 INFO thrift.ThriftEventSource: Starting blocking thread pool server on port 35867...
      11/08/30 15:07:47 INFO rolling.RollSink: opening RollSink 'escapedFormatDfs( "hdfs://localhost:8020/tmp/test3/", "flume-%{rolltag}

      .log" )'
      11/08/30 15:07:47 INFO debug.InsistentOpenDecorator: Opened MaskDecorator on try 0
      11/08/30 15:08:23 INFO endtoend.AckChecksumChecker: Starting checksum group called 20110830-145747661+0400.678151906732325.00000027
      11/08/30 15:08:23 INFO endtoend.AckChecksumChecker: initial checksum is 1321a567a8e
      11/08/30 15:08:23 INFO hdfs.EscapedCustomDfsSink: Opening hdfs://localhost:8020/tmp/test3/flume-20110830-150817116+0400.678781361271117.00000022.log
      11/08/30 15:08:25 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 0 time(s).
      11/08/30 15:08:26 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 1 time(s).
      11/08/30 15:08:27 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 2 time(s).
      11/08/30 15:08:28 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 3 time(s).
      11/08/30 15:08:28 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 4 time(s).
      11/08/30 15:08:28 INFO debug.StubbornAppendSink: append Interrupted event 'azotov-ws [INFO Tue Aug 30 14:57:47 MSD 2011]

      { AckChecksum : (long)3036875317 (string) ' 5' (double)1.500415765E-314 }

      { AckTag : 20110830-145747661+0400.678151906732325.00000027 }

      { AckType : msg }

      2011-08-15 07:01:58\t2006\t153897273691618\t00008C52000000006A6B2F\twww.ancestry.com/\t228\tNULL\t4100\t10162981=50|10163002=50|10163004=50|10162970=50\tNULL\t2\tNULL\tUS\t217\tCOMPUTER\tWINDOWS_7\tCHROME\t-1120233552\t1141025753' with error: Blocked append interrupted by rotation event
      11/08/30 15:08:28 INFO rolling.RollSink: closing RollSink 'escapedFormatDfs( "hdfs://localhost:8020/tmp/test3/", "flume-%

      {rolltag}.log" )'
      11/08/30 15:08:28 ERROR connector.DirectDriver: Closing down due to exception during append calls
      java.lang.InterruptedException
      at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1223)
      at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:976)
      at com.cloudera.flume.handlers.rolling.RollSink.close(RollSink.java:296)
      at com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
      at com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
      at com.cloudera.flume.handlers.debug.InsistentOpenDecorator.close(InsistentOpenDecorator.java:175)
      at com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
      at com.cloudera.flume.handlers.debug.StubbornAppendSink.append(StubbornAppendSink.java:78)
      at com.cloudera.flume.core.EventSinkDecorator.append(EventSinkDecorator.java:60)
      at com.cloudera.flume.handlers.debug.InsistentAppendDecorator.append(InsistentAppendDecorator.java:110)
      at com.cloudera.flume.core.EventSinkDecorator.append(EventSinkDecorator.java:60)
      at com.cloudera.flume.handlers.endtoend.AckChecksumChecker.append(AckChecksumChecker.java:172)
      at com.cloudera.flume.core.EventSinkDecorator.append(EventSinkDecorator.java:60)
      at com.cloudera.flume.handlers.batch.UnbatchingDecorator.append(UnbatchingDecorator.java:62)
      at com.cloudera.flume.core.EventSinkDecorator.append(EventSinkDecorator.java:60)
      at com.cloudera.flume.handlers.batch.GunzipDecorator.append(GunzipDecorator.java:81)
      at com.cloudera.flume.collector.CollectorSink.append(CollectorSink.java:222)
      at com.cloudera.flume.core.connector.DirectDriver$PumperThread.run(DirectDriver.java:110)
      11/08/30 15:08:28 INFO connector.DirectDriver: Connector logicalNode idea-collector-20 exited with error: null
      java.lang.InterruptedException
      at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1223)
      at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:976)
      at com.cloudera.flume.handlers.rolling.RollSink.close(RollSink.java:296)
      at com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
      at com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
      at com.cloudera.flume.handlers.debug.InsistentOpenDecorator.close(InsistentOpenDecorator.java:175)
      at com.cloudera.flume.core.EventSinkDecorator.close(EventSinkDecorator.java:67)
      at com.cloudera.flume.handlers.debug.StubbornAppendSink.append(StubbornAppendSink.java:78)
      at com.cloudera.flume.core.EventSinkDecorator.append(EventSinkDecorator.java:60)
      at com.cloudera.flume.handlers.debug.InsistentAppendDecorator.append(InsistentAppendDecorator.java:110)
      at com.cloudera.flume.core.EventSinkDecorator.append(EventSinkDecorator.java:60)
      at com.cloudera.flume.handlers.endtoend.AckChecksumChecker.append(AckChecksumChecker.java:172)
      at com.cloudera.flume.core.EventSinkDecorator.append(EventSinkDecorator.java:60)
      at com.cloudera.flume.handlers.batch.UnbatchingDecorator.append(UnbatchingDecorator.java:62)
      at com.cloudera.flume.core.EventSinkDecorator.append(EventSinkDecorator.java:60)
      at com.cloudera.flume.handlers.batch.GunzipDecorator.append(GunzipDecorator.java:81)
      at com.cloudera.flume.collector.CollectorSink.append(CollectorSink.java:222)
      at com.cloudera.flume.core.connector.DirectDriver$PumperThread.run(DirectDriver.java:110)
      11/08/30 15:08:28 INFO collector.CollectorSource: closed
      11/08/30 15:08:29 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 5 time(s).
      11/08/30 15:08:29 INFO thrift.ThriftEventSource: Closed server on port 35867...
      11/08/30 15:08:29 INFO thrift.ThriftEventSource: Queue still has 1000 elements ...
      11/08/30 15:08:30 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 6 time(s).
      11/08/30 15:08:31 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 7 time(s).
      11/08/30 15:08:32 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 8 time(s).
      11/08/30 15:08:33 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8020. Already tried 9 time(s).
      11/08/30 15:08:39 WARN thrift.ThriftEventSource: Close timed out due to no progress. Closing despite having 1000 values still enqueued
      11/08/30 15:08:39 INFO rolling.RollSink: closing RollSink 'escapedFormatDfs( "hdfs://localhost:8020/tmp/test3/", "flume-%{rolltag}

      .log" )'
      11/08/30 15:08:39 WARN endtoend.AckChecksumChecker: partial acks abandoned:

      {20110830-145747661+0400.678151906732325.00000027=1314701867662}

      11/08/30 15:08:39 ERROR connector.DirectDriver: Exiting driver logicalNode idea-collector-20 in error state CollectorSource | Collector because null

      Attachments

        Activity

          People

            Unassigned Unassigned
            azotcsit Aleksei Zotov
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: