Details
-
Question
-
Status: Resolved
-
Major
-
Resolution: Incomplete
-
2.1.0
-
None
Description
I've installed a system as followed:
--mesos master private IP of 10.x.x.2 , Public 35.x.x.6
--mesos slave private IP of 192.x.x.10, Public 111.x.x.2
Now the master assigned the task successfully to the slave, however, the task failed. The error message is as followed:
Exception in thread "main" 17/10/11 22:38:01 ERROR RpcOutboxMessage: Ask timeout before connecting successfully
Caused by: org.apache.spark.rpc.RpcTimeoutException: Cannot receive any reply in 120 seconds. This timeout is controlled by spark.rpc.askTimeout
When I look at the environment page, the spark.driver.host points to the private IP address of the master 10.x.x.2 instead of it public IP address 35.x.x.6. I look at the Wireshark capture and indeed, there was failed TCP package to the master private IP address.
Now if I set spark.driver.bindAddress from the master to its local IP address, spark.driver.host from the master to its public IP address, I get the following message.
ERROR TaskSchedulerImpl: Lost executor 1 on myhostname.singnet.com.sg: Unable to create executor due to Cannot assign requested address.
From my understanding, the spark.driver.bindAddress set it for both master and slave, hence the slave get the said error. Now I'm really wondering how do I proper setup spark to work on this clustering over public IP?