Details
Description
We observed from several users regarding Namenode flip-over is due to either zookeeper disk slowness (higher fsync cost) or network issue. We would need to avoid flip-over issue to some extent by increasing HA session timeout, ha.zookeeper.session-timeout.ms.
Default value is 5000 ms, seems very low in any production environment. I would suggest 10000 ms as default session timeout.
.. 2018-05-04 03:54:36,848 INFO zookeeper.ClientCnxn (ClientCnxn.java:run(1140)) - Client session timed out, have not heard from server in 4689ms for sessionid 0x260e24bac500aa3, closing socket connection and attempting reconnect 2018-05-04 03:56:49,088 INFO zookeeper.ClientCnxn (ClientCnxn.java:run(1140)) - Client session timed out, have not heard from server in 3981ms for sessionid 0x360fd152b8700fe, closing socket connection and attempting reconnect ..