Uploaded image for project: 'Flume'
  1. Flume
  2. FLUME-60

Flume Master becomes "Not connected" to ZK and does not recover

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 0.9.0
    • 0.9.5
    • Master
    • None

    Description

      This seems related to long pauses or breakpoints. Disconnecting from ZK is probably reasonable in these conditions, but ideally the connection should be recovered.

      As an example, after a long pause, a command that modifies ZK state has this error message:

      Not connected to ZooKeeper: CLOSED

      Here are some logs from the Master:

      10/07/15 21:04:02 INFO master.CommandManager: Submitting command: refreshAll []
      10/07/15 21:04:02 INFO master.CommandManager: Executing command: refreshAll []
      10/07/15 21:04:02 INFO master.ZooKeeperConfigStore: Created new config at /flume-cfgs/cfg-0000000017
      10/07/15 21:04:02 INFO master.ZooKeeperConfigStore: Created new config at /flume-cfgs/cfg-0000000018
      10/07/15 21:04:02 INFO master.ZooKeeperConfigStore: Config was updated - reloading
      10/07/15 21:04:02 INFO rolling.RollSink: Created RollSink: trigger=[TimeTrigger: maxAge=30000 tagger=com.cloudera.flume.handlers.rolling.ProcessTagger@440d8355] checkPeriodMs = 250 spec='collectorSink'
      10/07/15 21:04:02 INFO rolling.RollSink: Created RollSink: trigger=[TimeTrigger: maxAge=30000 tagger=com.cloudera.flume.handlers.rolling.ProcessTagger@5329645a] checkPeriodMs = 250 spec='collectorSink'
      10/07/15 21:04:02 INFO master.ZooKeeperConfigStore: Created new config at /flume-cfgs/cfg-0000000019
      10/07/15 21:04:02 INFO master.ZooKeeperConfigStore: Config was updated - reloading
      10/07/15 21:09:35 INFO server.ZooKeeperServer: Expiring session 0x129d953d26c0000, timeout of 5000ms exceeded
      10/07/15 21:09:35 INFO server.ZooKeeperServer: Expiring session 0x129d953d26c0001, timeout of 5000ms exceeded
      10/07/15 21:09:35 INFO server.PrepRequestProcessor: Processed session termination for sessionid: 0x129d953d26c0000
      10/07/15 21:09:35 INFO server.PrepRequestProcessor: Processed session termination for sessionid: 0x129d953d26c0001
      10/07/15 21:09:35 INFO zookeeper.ClientCnxn: Client session timed out, have not heard from server in 7841ms for sessionid 0x129d953d26c0001, closing socket connection and attempting reconnect
      10/07/15 21:09:35 WARN server.NIOServerCnxn: EndOfStreamException: Unable to read additional data from client sessionid 0x129d953d26c0001, likely client has closed socket
      10/07/15 21:09:35 INFO server.NIOServerCnxn: Closed socket connection for client /0:0:0:0:0:0:0:1:51489 which had sessionid 0x129d953d26c0001
      10/07/15 21:09:35 INFO zookeeper.ClientCnxn: Client session timed out, have not heard from server in 7841ms for sessionid 0x129d953d26c0000, closing socket connection and attempting reconnect
      10/07/15 21:09:35 WARN server.NIOServerCnxn: EndOfStreamException: Unable to read additional data from client sessionid 0x129d953d26c0000, likely client has closed socket
      10/07/15 21:09:35 INFO server.NIOServerCnxn: Closed socket connection for client /0:0:0:0:0:0:0:1:51488 which had sessionid 0x129d953d26c0000
      10/07/15 21:09:36 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181
      10/07/15 21:09:36 INFO zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session
      10/07/15 21:09:36 INFO server.NIOServerCnxn: Accepted socket connection from /127.0.0.1:38305
      10/07/15 21:09:36 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181
      10/07/15 21:09:36 INFO zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session
      10/07/15 21:09:36 INFO server.NIOServerCnxn: Accepted socket connection from /127.0.0.1:38306
      10/07/15 21:09:36 INFO server.NIOServerCnxn: Client attempting to renew session 0x129d953d26c0001 at /127.0.0.1:38306
      10/07/15 21:09:36 INFO server.NIOServerCnxn: Invalid session 0x129d953d26c0001 for client /127.0.0.1:38306, probably expired
      10/07/15 21:09:36 INFO zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x129d953d26c0001 has expired, closing socket connection
      10/07/15 21:09:36 INFO server.NIOServerCnxn: Closed socket connection for client /127.0.0.1:38306 which had sessionid 0x129d953d26c0001
      10/07/15 21:11:11 INFO server.NIOServerCnxn: Client attempting to renew session 0x129d953d26c0000 at /127.0.0.1:38305
      10/07/15 21:11:11 INFO server.NIOServerCnxn: Invalid session 0x129d953d26c0000 for client /127.0.0.1:38305, probably expired
      10/07/15 21:11:11 INFO server.NIOServerCnxn: Closed socket connection for client /127.0.0.1:38305 which had sessionid 0x129d953d26c0000
      10/07/15 21:11:11 INFO zookeeper.ClientCnxn: Client session timed out, have not heard from server in 95376ms for sessionid 0x129d953d26c0000, closing socket connection and attempting reconnect
      10/07/15 21:11:11 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181
      10/07/15 21:11:11 INFO zookeeper.ClientCnxn: Socket connection established to localhost/0:0:0:0:0:0:0:1:2181, initiating session
      10/07/15 21:11:11 INFO server.NIOServerCnxn: Accepted socket connection from /0:0:0:0:0:0:0:1:37856
      10/07/15 21:11:11 INFO server.NIOServerCnxn: Client attempting to renew session 0x129d953d26c0000 at /0:0:0:0:0:0:0:1:37856
      10/07/15 21:11:11 INFO server.NIOServerCnxn: Invalid session 0x129d953d26c0000 for client /0:0:0:0:0:0:0:1:37856, probably expired
      10/07/15 21:11:11 INFO zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x129d953d26c0000 has expired, closing socket connection
      10/07/15 21:11:11 INFO server.NIOServerCnxn: Closed socket connection for client /0:0:0:0:0:0:0:1:37856 which had sessionid 0x129d953d26c0000
      10/07/15 23:33:42 INFO master.CommandManager: Submitting command: refreshAll []
      10/07/15 23:33:42 INFO master.CommandManager: Executing command: refreshAll []
      10/07/15 23:33:42 WARN master.CommandManager: During cmdid:6 refreshAll [] : Not connected to ZooKeeper: CLOSED

      Attachments

        Activity

          People

            Unassigned Unassigned
            jmhsieh Jonathan Hsieh
            Votes:
            2 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: