Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-9825

Spark overwrites remote cluster "final" properties with local config

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.3.0
    • Component/s: YARN
    • Labels:
      None

      Description

      Configuration options specified in the hadoop cluster *.xml config files can be marked as "final", indicating that they should not be overwritten by a client's configuration. Spark appears to be over-writing those options, the symptom of which is that local proxy settings overwrite the cluster-side proxy settings. This breaks things when trying to run jobs on a remote, firewalled, YARN cluster.

      For example, with the configuration below, one should be able to establish a SOCKS proxy via ssh -D to a host that can "see" the cluster, and then submit jobs and run the driver on the local desktop/laptop:

      Remote cluster-side core-site.xml:

      <property>
          <name>hadoop.rpc.socket.factory.class.default</name>
          <value>org.apache.hadoop.net.StandardSocketFactory</value>
        <final>true</final>
      </property>
      

      This configuration ensures that the nodes within the cluster never use a proxy to talk to each other.

      Local client-side core-site.xml:

      <property>
        <name>hadoop.rpc.socket.factory.class.default</name>
          <value>org.apache.hadoop.net.SocksSocketFactory</value>
      </property>
      
      <property>
          <name>hadoop.socks.server</name>
          <value>localhost:9999</value>
      </property>
      

      Indeed, running a standard MapReduce job, the log files show that an override of a property marked <final> is attempted:

      2015-07-27 15:26:11,706 WARN [main] org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final parameter: hadoop.rpc.socket.factory.class.default;  Ignoring.
      

      and the MR job proceeds and finishes normally.

      On the other hand, a Spark job with the same configuration shows no such message and instead we see that the nodes within the cluster are not able to communicate:

      15/07/27 15:25:43 INFO client.RMProxy: Connecting to ResourceManager at node1/10.211.55.101:8030
      15/07/27 15:25:43 INFO yarn.YarnRMClient: Registering the ApplicationMaster
      15/07/27 15:25:44 INFO ipc.Client: Retrying connect to server: node1/10.211.55.101:8030. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
      

      Running tcpdump on the slave nodes shows that in the case of the MR job, packets are sent between slave nodes and the ResourceManager node indicating that no proxy is being used, while in the case of the Spark job no such connection is made.

      A further indication that the cluster-side configuration is altered is that if a dedicated proxy server is set up in a way that both sides can see it, i.e. the local core-site.xml is changed to have

      <property>
          <name>hadoop.socks.server</name>
          <value>node2:9999</value>
      </property>
      

      the Spark job (and the MR job) run fine, with all connections going through the dedicated proxy server. While this works, it's sub-optimal because it now requires that such a server be created, which may not always be possible because it requires privileged access to the gateway machine.

      Therefore, it appears that Spark is perfectly happy running through a proxy in YARN mode, but that it garbles the cluster-side configuration even when properties are marked as <final>. I'm not sure if this is intended? Or is there some other way that preserving the "final" properties can be enforced?

        Issue Links

          Activity

          Hide
          apachespark Apache Spark added a comment -

          User 'vanzin' has created a pull request for this issue:
          https://github.com/apache/spark/pull/18370

          Show
          apachespark Apache Spark added a comment - User 'vanzin' has created a pull request for this issue: https://github.com/apache/spark/pull/18370
          Hide
          rrrrrok Rok Roskar added a comment -

          I'm not sure who has the responsibility to honor the "final" property flag – client or cluster side? If the "final" designation is ignored in general it has the potential to be problematic in general not just in this use-case.

          Show
          rrrrrok Rok Roskar added a comment - I'm not sure who has the responsibility to honor the "final" property flag – client or cluster side? If the "final" designation is ignored in general it has the potential to be problematic in general not just in this use-case.
          Hide
          srowen Sean Owen added a comment -

          I don't believe Spark modifies any of these settings. Is that even possible in the Configuration object? it is however possible that something somewhere is managing to create a config without, somehow, the defaults configured in these files.

          Show
          srowen Sean Owen added a comment - I don't believe Spark modifies any of these settings. Is that even possible in the Configuration object? it is however possible that something somewhere is managing to create a config without, somehow, the defaults configured in these files.

            People

            • Assignee:
              vanzin Marcelo Vanzin
              Reporter:
              rrrrrok Rok Roskar
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development