Uploaded image for project: 'Flume'
  1. Flume
  2. FLUME-659

Agent with Thrift rpcSource closes source after receiving new config from master

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 0.9.3
    • None
    • Master, Node, Sinks+Sources
    • Ubuntu 10.10 Maverick Meerkat

    Description

      You can reproduce this problem by following these steps:

      Set up:

      • Master
      • Agent: rpcSource(35092) | agent*(...) # agent*Sink and agent*Chain all have this problem
      • Collector: collectorSource(...) | collectorSink(...)

      Start sending events to the agent using Thrift. Then use the flume shell on master to configure the agent – you can even use the exact same config as the agent had in the first place. Make sure the agent receives this configuration while still being sent events. After the agent receives its configuration, it will close its source server for some reason and thereafter become unresponsive to new configurations. This is the sample output from the agent logs:

      2011-06-15 07:29:04,086 INFO com.cloudera.flume.handlers.thrift.ThriftEventSink: ThriftEventSink on port 35853 closed
      2011-06-15 07:29:05,088 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Closed server on port 35092...
      2011-06-15 07:29:05,088 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Queue still has 4 elements ...

      And of course, the fact that the server is closed results in lots of the following types of errors in the application that's sending events:

      Thrift::TransportException: Broken pipe
      Thrift::TransportException: Could not connect to localhost:35092: Connection refused - connect(2)

      Another variation to reproduce this type of error is to bring the master down, then bring it back up, at which point it will send its configuration to the agent node. Upon receiving the new configuration, the agent closes its source server and becomes unresponsive to new configurations. The following is output from an agent that was configured with two logical nodes, one that was rpcSource(35090) | agentE2EChain(...) and one that was rpcSource(35092) | agentBEChain(...)

      2011-06-15 05:37:46,731 INFO com.cloudera.flume.agent.ThriftMasterRPC: Connected to master at flume-master:35872
      2011-06-15 05:37:51,770 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Closed server on port 35090...
      2011-06-15 05:37:51,771 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Queue still has 0 elements ...
      2011-06-15 05:37:51,787 INFO com.cloudera.flume.handlers.thrift.ThriftEventSink: ThriftEventSink on port 35853 closed
      2011-06-15 05:37:51,868 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Closed server on port 35090...
      2011-06-15 05:37:51,868 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Queue still has 0 elements ...
      2011-06-15 05:37:51,868 WARN com.cloudera.flume.handlers.debug.LazyOpenDecorator: Closing a lazy sink that was not logically opened
      2011-06-15 05:37:51,868 INFO com.cloudera.flume.agent.LogicalNode: flume-agent: Connector stopped: LazyOpenSource | LazyOpenDecorator
      2011-06-15 05:37:51,875 INFO com.cloudera.flume.agent.LogicalNode: Node config successfully set to com.cloudera.flume.conf.FlumeConfigData@42143753
      2011-06-15 05:37:51,880 INFO com.cloudera.flume.agent.LogicalNode: Connector started: LazyOpenSource | LazyOpenDecorator
      2011-06-15 05:37:51,881 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Starting blocking thread pool server on port 35090...
      2011-06-15 05:37:52,788 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Closed server on port 35092...
      2011-06-15 05:37:52,788 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Queue still has 6 elements ...

      I once produced an exception using this master-down/master-up procedure:

      2011-06-15 04:50:45,543 ERROR com.cloudera.flume.core.connector.DirectDriver: Driving src/sink failed! LazyOpenSource | LazyOpenDecorator because NaiveFileWALDeco not open for append
      java.lang.IllegalStateException: NaiveFileWALDeco not open for append
      at com.google.common.base.Preconditions.checkState(Preconditions.java:145)
      at com.cloudera.flume.agent.durability.NaiveFileWALDeco.append(NaiveFileWALDeco.java:133)
      at com.cloudera.flume.core.CompositeSink.append(CompositeSink.java:61)
      at com.cloudera.flume.agent.AgentFailChainSink.append(AgentFailChainSink.java:103)
      at com.cloudera.flume.core.EventSinkDecorator.append(EventSinkDecorator.java:60)
      at com.cloudera.flume.handlers.debug.LazyOpenDecorator.append(LazyOpenDecorator.java:75)
      at com.cloudera.flume.core.connector.DirectDriver$PumperThread.run(DirectDriver.java:93)
      2011-06-15 04:50:45,544 INFO com.cloudera.flume.agent.LogicalNode: Connector xxxxxxxx.internal-E2E exited with error NaiveFileWALDeco not open for append
      2011-06-15 04:50:46,544 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Closed server on port 35090...
      2011-06-15 04:50:46,545 INFO com.cloudera.flume.handlers.thrift.ThriftEventSource: Queue still has 6 elements ...
      2011-06-15 04:50:50,443 INFO com.cloudera.flume.agent.AgentFailChainSink: Setting e2e failover chain to { ackedWriteAhead => { stubbornAppend =>

      { insistentOpen => failChain(" %s ","tsink(\"collector1\",35853)","tsink(\"collector2\",35853)") }

      } }
      2011-06-15 04:50:50,443 INFO com.cloudera.flume.agent.AgentFailChainSink: Setting failover chain to { ackedWriteAhead => { stubbornAppend =>

      { insistentOpen => failChain(" %s ","tsink(\"collector2\",35853)","tsink(\"collector2\",35853)") }

      } }

      Attachments

        Activity

          People

            Unassigned Unassigned
            flume_clizzin Disabled imported user
            Votes:
            3 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: