Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
I ran several tests experimenting Samza with a cluster of size 36 nodes. I have the following observations:
1.On a cluster with about 50% utilization. The percentage of requests that are mapped to preferred hosts seems to depend on yarn.container.count. The % is higher when yarn.container.count is comparable to the size of the cluster.
(For example.) I get about 50% of requests matched when yarn.container.count is 30. and When yarn.container.count is 10, only 27% of requests are matched. (on a 36 node cluster)
One reason is because, when spawning a large # of containers initially, many requests are made in bulk successively, there is a good chance that any random host in the cluster will match with the preferred request. However, when spawning a particular container during failure, there's only one request for the failed container, and it has a lesser chance of a match.
The results are averaged across 20 runs in each scenario.
2. On a cluster with about zero utilization, 100% of requests are matched to preferred hosts irrespective of yarn.container.count.
This ticket is to explore alternatives to see if they will improve % of matched hosts.
I believe these ideas are worth trying:
1. Yarn supports the idea of a 'relaxed locality' flag that can be specified with the request. We could set 'relaxed locality' to false. (This will ensure that we get the request on the exact same host we ask for.) If we don't get such a request within a timeout, we may re-request the same request with 'relaxed locality' to true. (as we currently do now.)
2. Re-issue the same preferred host request again, if the hosts returned don't match the request.
Attachments
Attachments
Issue Links
- depends upon
-
SAMZA-922 Host Affinity - Bug in SamzaContainerRequest causes (recoverable) exceptions in YARN
- Resolved