IPC handler uses ByteArrayOutputStream for serializing response. It grows doubling every time. Any large response results in growth of this buffer, resulting in large heap usage. This memory is never reclaimed by GC.
On namenode, the following RPC calls have huge impact on the buffer growth:
- publi# BlocksWithLocations NameNodeProtocol.getBlocks(DatanodeInfo datanode, long size)
- public DatanodeInfo ClientProtocol.getDatanodeReport(FSConstants.DatanodeReportType type)
- public FileStatus ClientProtocol.getListing(String src)
Of this, I think getBlocks() is the issue. On production clusters I saw this heap growing to 167MB. This does not explain buffers growing to the size of 167MB (byte grew to as big as 167772160 entries). Given this kind of large responses, we should relinquish the old buffer and create a new one, when ever the response buffer grows to a large size and let the old buffer garbage collected. This is safe, as currently the buffer is used as intermediate storage for serializing the response.