Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
Add a doc note in upgrade and in zookeeper section recommending upping zk jute.maxbuffer to be above the default of 1M.
Here is jute.maxbuffer from zk doc.
jute.maxbuffer: (Java system property: jute.maxbuffer) This option can only be set as a Java system property. There is no zookeeper prefix on it. It specifies the maximum size of the data that can be stored in a znode. The default is 0xfffff, or just under 1M. If this option is changed, the system property must be set on all servers and clients otherwise problems will arise. This is really a sanity check. ZooKeeper is designed to store data on the order of kilobytes in size.
It seems easy enough blowing the 1MB default. Here is one such scenario. A peer is disabled so WALs backup on each RegionServer or a bug makes it so we don't clear WALs out from under the RegionServer promptly. Backed-up WALs get into the hundreds... easy enough on a busy cluster. Next, there is a power outage and the cluster crashes down.
Recovery may require an SCP recovering hundreds of WALs. As is, the way our SCP works, we can end up with a /hbase/splitWAL dir with hundreds – even thousands – of WALs in it. The 1MB buffer limit in zk can't carry listings this big.
Of note, the jute.maxbuffer needs to be set on the zk servers – with restart so the change is noticed – and on the client-side, in the hbase master at least.
This issue is about highlighting this old issue in our doc. It seems to be absent totally.
Attachments
Issue Links
- relates to
-
HBASE-22057 Impose upper-bound on size of ZK ops sent in a single multi()
- Resolved
-
HBASE-4246 Cluster with too many regions cannot withstand some master failover scenarios
- Closed
-
HBASE-6625 If we have hundreds of thousands of regions getChildren will encouter zk exception
- Closed
-
HBASE-18396 Encode ZNode names to reduce ZooKeeper jute buffer length requirements and thus reduce memory usage
- Resolved