[SPARK-9825] Spark overwrites remote cluster "final" properties with local config - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.3.0
Component/s: YARN
Labels:
None

Description

Configuration options specified in the hadoop cluster *.xml config files can be marked as "final", indicating that they should not be overwritten by a client's configuration. Spark appears to be over-writing those options, the symptom of which is that local proxy settings overwrite the cluster-side proxy settings. This breaks things when trying to run jobs on a remote, firewalled, YARN cluster.

For example, with the configuration below, one should be able to establish a SOCKS proxy via ssh -D to a host that can "see" the cluster, and then submit jobs and run the driver on the local desktop/laptop:

Remote cluster-side core-site.xml:

<property>
    <name>hadoop.rpc.socket.factory.class.default</name>
    <value>org.apache.hadoop.net.StandardSocketFactory</value>
  <final>true</final>
</property>

This configuration ensures that the nodes within the cluster never use a proxy to talk to each other.

Local client-side core-site.xml:

<property>
  <name>hadoop.rpc.socket.factory.class.default</name>
    <value>org.apache.hadoop.net.SocksSocketFactory</value>
</property>

<property>
    <name>hadoop.socks.server</name>
    <value>localhost:9999</value>
</property>

Indeed, running a standard MapReduce job, the log files show that an override of a property marked <final> is attempted:

2015-07-27 15:26:11,706 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.rpc.socket.factory.class.default;  Ignoring.

and the MR job proceeds and finishes normally.

On the other hand, a Spark job with the same configuration shows no such message and instead we see that the nodes within the cluster are not able to communicate:

15/07/27 15:25:43 INFO client.RMProxy: Connecting to ResourceManager at node1/10.211.55.101:8030
15/07/27 15:25:43 INFO yarn.YarnRMClient: Registering the ApplicationMaster
15/07/27 15:25:44 INFO ipc.Client: Retrying connect to server: node1/10.211.55.101:8030. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

Running tcpdump on the slave nodes shows that in the case of the MR job, packets are sent between slave nodes and the ResourceManager node indicating that no proxy is being used, while in the case of the Spark job no such connection is made.

A further indication that the cluster-side configuration is altered is that if a dedicated proxy server is set up in a way that both sides can see it, i.e. the local core-site.xml is changed to have

<property>
    <name>hadoop.socks.server</name>
    <value>node2:9999</value>
</property>

the Spark job (and the MR job) run fine, with all connections going through the dedicated proxy server. While this works, it's sub-optimal because it now requires that such a server be created, which may not always be possible because it requires privileged access to the gateway machine.

Therefore, it appears that Spark is perfectly happy running through a proxy in YARN mode, but that it garbles the cluster-side configuration even when properties are marked as <final>. I'm not sure if this is intended? Or is there some other way that preserving the "final" properties can be enforced?

Attachments

Issue Links

is related to

SPARK-19277 YARN topology script configuration needs to be localized by Spark

Resolved

links to

[Github] Pull Request #18370 (vanzin)

Activity

People

Assignee:: Marcelo Masiero Vanzin

Reporter:: Rok Roskar

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 11/Aug/15 16:01

Updated:: 14/Jul/17 21:32

Resolved:: 14/Jul/17 21:32