Details
-
Bug
-
Status: In Progress
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
During controller failover and broker changes, it sends out UpdateMetadataRequest to all brokers in the cluster containing the states for all partitions and live brokers. The current implementation will instantiate the UpdateMetadataRequest object and its serialized form (Struct) for <# of brokers> times, which causes OOM if the memory exceeds the configure JVM heap size. We have seen this issue in the production environment for multiple times.
For example, if we have 100 brokers in the cluster and each broker is the leader of 2k partitions, the extra memory usage introduced by controller trying to send out UpdateMetadataRequest is around:
<memory used by UpdateMetadataRequest Structs> * <# of brokers> * <total # of leader parittions>
= 250B * 100 * 200k = 5GB
Attachments
Issue Links
- links to