Details
-
Bug
-
Status: Resolved
-
Critical
-
Resolution: Fixed
-
None
Description
Env: HDC Spark Data science (m4x4xlarge 16 CPU/64 GB)
Spark defaults aren't changed in ambari. It is loading with 1 GB spark.executor.memory.
(Should this be 60-70% of yarn min container size. Need to consider spark.yarn.executor.memoryOverhead)
Add such logic for "spark.shuffle.io.numConnectionsPerPeer":
spark.shuffle.io.numConnectionsPerPeer should be configured dynamically based on cluster size.
Recommandation was to set it to 10 if number of nodes < 10 and remove (so that default value is used) for higher values.
Attachments
Issue Links
- links to