Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
2.1.0
-
None
Description
Spark jobs are running on yarn cluster in my warehouse. We enabled the external shuffle service(--conf spark.shuffle.service.enabled=true). Recently NodeManager runs OOM now and then. Dumping heap memory, we find that OneFroOneStreamManager's footprint is huge. NodeManager is configured with 5G heap memory. While OneForOneManager costs 2.5G and there are 5503233 FileSegmentManagedBuffer objects. Is there any suggestions to avoid this other than just keep increasing NodeManager's memory? Is it possible to stop registerStream in OneForOneStreamManager? Thus we don't need to cache so many metadatas(i.e. StreamState).