Uploaded image for project: 'Livy'
  1. Livy
  2. LIVY-660

How can we use YARN and all the nodes in our cluster when submiting a pySpark job

    XMLWordPrintableJSON

    Details

    • Type: Question
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: 0.6.0
    • Fix Version/s: None
    • Component/s: Server
    • Labels:
      None

      Description

      How can we use YARN and all the nodes in our cluster when submiting a pySpark job?

      We have edited all the required .conf files but nothing happens..... =(

       

       

      [root@cdh-node06 conf]# cat livy-client.conf

      #

      1. Licensed to the Apache Software Foundation (ASF) under one or more
      1. contributor license agreements.  See the NOTICE file distributed with
      1. this work for additional information regarding copyright ownership.
      1. The ASF licenses this file to You under the Apache License, Version 2.0
      1. (the "License"); you may not use this file except in compliance with
      1. the License.  You may obtain a copy of the License at

      #

      #    http://www.apache.org/licenses/LICENSE-2.0

      #

      1. Unless required by applicable law or agreed to in writing, software
      1. distributed under the License is distributed on an "AS IS" BASIS,
      1. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      1. See the License for the specific language governing permissions and
      1. limitations under the License.

      #

      1. Use this keystore for the SSL certificate and key.
      1. livy.keystore =

      dew0wf-e

      1. Specify the keystore password.
      1. livy.keystore.password =

      #

      welfka

      1. Specify the key password.
      1. livy.key-password =

       

      1. Hadoop Credential Provider Path to get "livy.keystore.password" and "livy.key-password".
      1. Credential Provider can be created using command as follow:
      1. hadoop credential create "livy.keystore.password" -value "secret" -provider jceks://hdfs/path/to/livy.jceks
      1. livy.hadoop.security.credential.provider.path =

       

      1. What host address to start the server on. By default, Livy will bind to all network interfaces.
      1. livy.server.host = 0.0.0.0

       

      1. What port to start the server on.
      1. livy.server.port = 8998

       

      1. What base path ui should work on. By default UI is mounted on "/".
      1. E.g.: livy.ui.basePath = /my_livy - result in mounting UI on /my_livy/
      1. livy.ui.basePath = ""

       

      1. What spark master Livy sessions should use.

      livy.spark.master = yarn

       

      1. What spark deploy mode Livy sessions should use.

      livy.spark.deploy-mode = cluster

       

      1. Configure Livy server http request and response header size.
      1. livy.server.request-header.size = 131072
      1. livy.server.response-header.size = 131072

       

      1. Enabled to check whether timeout Livy sessions should be stopped.
      1. livy.server.session.timeout-check = true

       

      1. Time in milliseconds on how long Livy will wait before timing out an idle session.
      1. livy.server.session.timeout = 1h

      #

      1. How long a finished session state should be kept in LivyServer for query.
      1. livy.server.session.state-retain.sec = 600s

       

      1. If livy should impersonate the requesting users when creating a new session.
      1. livy.impersonation.enabled = true

       

      1. Logs size livy can cache for each session/batch. 0 means don't cache the logs.
      1. livy.cache-log.size = 200

       

      1. Comma-separated list of Livy RSC jars. By default Livy will upload jars from its installation
      1. directory every time a session is started. By caching these files in HDFS, for example, startup
      1. time of sessions on YARN can be reduced.
      1. livy.rsc.jars =

       

      1. Comma-separated list of Livy REPL jars. By default Livy will upload jars from its installation
      1. directory every time a session is started. By caching these files in HDFS, for example, startup
      1. time of sessions on YARN can be reduced. Please list all the repl dependencies including
      1. Scala version-specific livy-repl jars, Livy will automatically pick the right dependencies
      1. during session creation.
      1. livy.repl.jars =

       

      1. Location of PySpark archives. By default Livy will upload the file from SPARK_HOME, but
      1. by caching the file in HDFS, startup time of PySpark sessions on YARN can be reduced.
      1. livy.pyspark.archives =

       

      1. Location of the SparkR package. By default Livy will upload the file from SPARK_HOME, but
      1. by caching the file in HDFS, startup time of R sessions on YARN can be reduced.
      1. livy.sparkr.package =

       

      1. List of local directories from where files are allowed to be added to user sessions. By
      1. default it's empty, meaning users can only reference remote URIs when starting their
      1. sessions.
      1. livy.file.local-dir-whitelist =

       

      1. Whether to enable csrf protection, by default it is false. If it is enabled, client should add
      1. http-header "X-Requested-By" in request if the http method is POST/DELETE/PUT/PATCH.
      1. livy.server.csrf-protection.enabled =

       

      1. Whether to enable HiveContext in livy interpreter, if it is true hive-site.xml will be detected
      1. on user request and then livy server classpath automatically.
      1. livy.repl.enable-hive-context =

       

      1. Recovery mode of Livy. Possible values:
      1. off: Default. Turn off recovery. Every time Livy shuts down, it stops and forgets all sessions.
      1. recovery: Livy persists session info to the state store. When Livy restarts, it recovers
      1.           previous sessions from the state store.
      1. Must set livy.server.recovery.state-store and livy.server.recovery.state-store.url to
      1. configure the state store.
      1. livy.server.recovery.mode = off

       

      1. Where Livy should store state to for recovery. Possible values:
      1. <empty>: Default. State store disabled.
      1. filesystem: Store state on a file system.
      1. zookeeper: Store state in a Zookeeper instance.
      1. livy.server.recovery.state-store =

       

      1. For filesystem state store, the path of the state store directory. Please don't use a filesystem
      1. that doesn't support atomic rename (e.g. S3). e.g. file:///tmp/livy or hdfs:///.
      1. For zookeeper, the address to the Zookeeper servers. e.g. host1:port1,host2:port2
      1. livy.server.recovery.state-store.url =

       

      1. If Livy can't find the yarn app within this time, consider it lost.
      1. livy.server.yarn.app-lookup-timeout = 120s
      1. When the cluster is busy, we may fail to launch yarn app in app-lookup-timeout, then it would
      1. cause session leakage, so we need to check session leakage.
      1. How long to check livy session leakage
      1. livy.server.yarn.app-leakage.check-timeout = 600s
      1. how often to check livy session leakage
      1. livy.server.yarn.app-leakage.check-interval = 60s

       

      1. How often Livy polls YARN to refresh YARN app state.
      1. livy.server.yarn.poll-interval = 5s

      #

      1. Days to keep Livy server request logs.
      1. livy.server.request-log-retain.days = 5

       

      1. If the Livy Web UI should be included in the Livy Server. Enabled by default.
      1. livy.ui.enabled = true

       

      1. Whether to enable Livy server access control, if it is true then all the income requests will
      1. be checked if the requested user has permission.
      1. livy.server.access-control.enabled = false

       

      1. Allowed users to access Livy, by default any user is allowed to access Livy. If user want to
      1. limit who could access Livy, user should list all the permitted users with comma separated.
      1. livy.server.access-control.allowed-users = *

       

      1. A list of users with comma separated has the permission to change other user's submitted
      1. session, like submitting statements, deleting session.
      1. livy.server.access-control.modify-users =

       

      1. A list of users with comma separated has the permission to view other user's infomation, like
      1. submitted session state, statement results.
      1. livy.server.access-control.view-users =

      #

      1. Authentication support for Livy server
      1. Livy has a built-in SPnego authentication support for HTTP requests  with below configurations.
      1. livy.server.auth.type = kerberos
      1. livy.server.auth.kerberos.principal = <spnego principal>
      1. livy.server.auth.kerberos.keytab = <spnego keytab>
      1. livy.server.auth.kerberos.name-rules = DEFAULT

      #

      1. If user wants to use custom authentication filter, configurations are:
      1. livy.server.auth.type = <custom>
      1. livy.server.auth.<custom>.class = <class of custom auth filter>
      1. livy.server.auth.<custom>.param.<foo1> = <bar1>
      1. livy.server.auth.<custom>.param.<foo2> = <bar2>

      export JAVA_HOME=/usr/java/jdk1.8.0_121-cloudera/jre/

      export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark/

      export SPARK_CONF_DIR=$SPARK_HOME/conf

      export HADOOP_HOME=/etc/hadoop/

      export HADOOP_CONF_DIR=/etc/hadoop/conf

       

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              sebastys Sebastian Rama
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: