Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
-
Reviewed
Description
Using jxray (www.jxray.com) I've analyzed several heap dumps from YARN Resource Manager running in a big cluster. The tool uncovered several sources of memory waste. One problem, which results in wasting more than a quarter of all memory, is a large number of duplicate LiteralByteString objects coming from the following reference chain:
1,011,810K (26.9%): byte[]: 5416705 / 100% dup arrays (22108 unique)
↖com.google.protobuf.LiteralByteString.bytes
↖org.apache.hadoop.yarn.proto.YarnServerCommonServiceProtos$.credentialsForApp_
↖{j.u.ArrayList}
↖j.u.Collections$UnmodifiableRandomAccessList.c
↖org.apache.hadoop.yarn.proto.YarnServerCommonServiceProtos$NodeHeartbeatResponseProto.systemCredentialsForApps_
↖org.apache.hadoop.yarn.server.api.protocolrecords.impl.pb.NodeHeartbeatResponsePBImpl.proto
↖org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl.latestNodeHeartBeatResponse
↖org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FSSchedulerNode.rmNode
...
That is, collectively reference chains that look as above hold in memory 5.4 million LiteralByteString objects, but only ~22 thousand of these objects are unique. Deduplicating these objects, e.g. using a Google Object Interner instance, would save ~1GB of memory.
It looks like the main place where the above LiteralByteString}}s are created and attached to the {{SystemCredentialsForAppsProto objects is in NodeHeartbeatResponsePBImpl.java, method addSystemCredentialsToProto(). Probably adding a call to an interner there will fix the problem. wi