Found one issue in RMProxy how to initialize RetryPolicy: In RMProxy#createRetryPolicy. When rmConnectWaitMS is set to -1 (wait forever), it uses RetryPolicies.RETRY_FOREVER which doesn't respect yarn.resourcemanager.connect.retry-interval.ms setting.
RetryPolicies.RETRY_FOREVER uses 0 as the interval, when I run the test without properly setup localhost name: TestYarnClient#testShouldNotRetryForeverForNonNetworkExceptions, it wrote 14G DEBUG exception message to system before it dies. This will be very bad if we do the same thing in a production cluster.
We should fix two places:
- Make RETRY_FOREVER can take retry-interval as constructor parameter.
- Respect retry-interval when we uses RETRY_FOREVER policy.