Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
1.2.3
-
None
Description
Cluster installation fails on Datanode with task timeout aproximately in 5 minutes. It is not an UI issue, I've checked via API that server considers the request status to be TIMEDOUT, and agent continues running puppet manifests (each resulting with 0 return code). Agent logs does not contain watchdog messages. Our local internet connection is slower then Amazon internal network, and as a result installation takes more time (in my case, this issue reproduces almost every time when installing on a 2-node cluster). Clicking retry results in successful installation due to cached packages on previous attempt.
We have different timeout values at agent and server (10 minutes and 5 minutes).
/src/main/python/ambari_agent/PuppetExecutor.py:42
PUPPET_TIMEOUT_SECONDS = 600
com.google.inject.AbstractModule#bindConstant
bindConstant().annotatedWith(Names.named("actionTimeout")).to(300000L);