Description
This seems related to long pauses or breakpoints. Disconnecting from ZK is probably reasonable in these conditions, but ideally the connection should be recovered.
As an example, after a long pause, a command that modifies ZK state has this error message:
Not connected to ZooKeeper: CLOSED
Here are some logs from the Master:
10/07/15 21:04:02 INFO master.CommandManager: Submitting command: refreshAll []
10/07/15 21:04:02 INFO master.CommandManager: Executing command: refreshAll []
10/07/15 21:04:02 INFO master.ZooKeeperConfigStore: Created new config at /flume-cfgs/cfg-0000000017
10/07/15 21:04:02 INFO master.ZooKeeperConfigStore: Created new config at /flume-cfgs/cfg-0000000018
10/07/15 21:04:02 INFO master.ZooKeeperConfigStore: Config was updated - reloading
10/07/15 21:04:02 INFO rolling.RollSink: Created RollSink: trigger=[TimeTrigger: maxAge=30000 tagger=com.cloudera.flume.handlers.rolling.ProcessTagger@440d8355] checkPeriodMs = 250 spec='collectorSink'
10/07/15 21:04:02 INFO rolling.RollSink: Created RollSink: trigger=[TimeTrigger: maxAge=30000 tagger=com.cloudera.flume.handlers.rolling.ProcessTagger@5329645a] checkPeriodMs = 250 spec='collectorSink'
10/07/15 21:04:02 INFO master.ZooKeeperConfigStore: Created new config at /flume-cfgs/cfg-0000000019
10/07/15 21:04:02 INFO master.ZooKeeperConfigStore: Config was updated - reloading
10/07/15 21:09:35 INFO server.ZooKeeperServer: Expiring session 0x129d953d26c0000, timeout of 5000ms exceeded
10/07/15 21:09:35 INFO server.ZooKeeperServer: Expiring session 0x129d953d26c0001, timeout of 5000ms exceeded
10/07/15 21:09:35 INFO server.PrepRequestProcessor: Processed session termination for sessionid: 0x129d953d26c0000
10/07/15 21:09:35 INFO server.PrepRequestProcessor: Processed session termination for sessionid: 0x129d953d26c0001
10/07/15 21:09:35 INFO zookeeper.ClientCnxn: Client session timed out, have not heard from server in 7841ms for sessionid 0x129d953d26c0001, closing socket connection and attempting reconnect
10/07/15 21:09:35 WARN server.NIOServerCnxn: EndOfStreamException: Unable to read additional data from client sessionid 0x129d953d26c0001, likely client has closed socket
10/07/15 21:09:35 INFO server.NIOServerCnxn: Closed socket connection for client /0:0:0:0:0:0:0:1:51489 which had sessionid 0x129d953d26c0001
10/07/15 21:09:35 INFO zookeeper.ClientCnxn: Client session timed out, have not heard from server in 7841ms for sessionid 0x129d953d26c0000, closing socket connection and attempting reconnect
10/07/15 21:09:35 WARN server.NIOServerCnxn: EndOfStreamException: Unable to read additional data from client sessionid 0x129d953d26c0000, likely client has closed socket
10/07/15 21:09:35 INFO server.NIOServerCnxn: Closed socket connection for client /0:0:0:0:0:0:0:1:51488 which had sessionid 0x129d953d26c0000
10/07/15 21:09:36 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181
10/07/15 21:09:36 INFO zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session
10/07/15 21:09:36 INFO server.NIOServerCnxn: Accepted socket connection from /127.0.0.1:38305
10/07/15 21:09:36 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181
10/07/15 21:09:36 INFO zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session
10/07/15 21:09:36 INFO server.NIOServerCnxn: Accepted socket connection from /127.0.0.1:38306
10/07/15 21:09:36 INFO server.NIOServerCnxn: Client attempting to renew session 0x129d953d26c0001 at /127.0.0.1:38306
10/07/15 21:09:36 INFO server.NIOServerCnxn: Invalid session 0x129d953d26c0001 for client /127.0.0.1:38306, probably expired
10/07/15 21:09:36 INFO zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x129d953d26c0001 has expired, closing socket connection
10/07/15 21:09:36 INFO server.NIOServerCnxn: Closed socket connection for client /127.0.0.1:38306 which had sessionid 0x129d953d26c0001
10/07/15 21:11:11 INFO server.NIOServerCnxn: Client attempting to renew session 0x129d953d26c0000 at /127.0.0.1:38305
10/07/15 21:11:11 INFO server.NIOServerCnxn: Invalid session 0x129d953d26c0000 for client /127.0.0.1:38305, probably expired
10/07/15 21:11:11 INFO server.NIOServerCnxn: Closed socket connection for client /127.0.0.1:38305 which had sessionid 0x129d953d26c0000
10/07/15 21:11:11 INFO zookeeper.ClientCnxn: Client session timed out, have not heard from server in 95376ms for sessionid 0x129d953d26c0000, closing socket connection and attempting reconnect
10/07/15 21:11:11 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181
10/07/15 21:11:11 INFO zookeeper.ClientCnxn: Socket connection established to localhost/0:0:0:0:0:0:0:1:2181, initiating session
10/07/15 21:11:11 INFO server.NIOServerCnxn: Accepted socket connection from /0:0:0:0:0:0:0:1:37856
10/07/15 21:11:11 INFO server.NIOServerCnxn: Client attempting to renew session 0x129d953d26c0000 at /0:0:0:0:0:0:0:1:37856
10/07/15 21:11:11 INFO server.NIOServerCnxn: Invalid session 0x129d953d26c0000 for client /0:0:0:0:0:0:0:1:37856, probably expired
10/07/15 21:11:11 INFO zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x129d953d26c0000 has expired, closing socket connection
10/07/15 21:11:11 INFO server.NIOServerCnxn: Closed socket connection for client /0:0:0:0:0:0:0:1:37856 which had sessionid 0x129d953d26c0000
10/07/15 23:33:42 INFO master.CommandManager: Submitting command: refreshAll []
10/07/15 23:33:42 INFO master.CommandManager: Executing command: refreshAll []
10/07/15 23:33:42 WARN master.CommandManager: During cmdid:6 refreshAll [] : Not connected to ZooKeeper: CLOSED