Uploaded image for project: 'Bigtop'
  1. Bigtop
  2. BIGTOP-1658

puppet recipe updates for latest spark (1.3+ )

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: 0.9.0
    • Fix Version/s: 1.1.0
    • Component/s: spark
    • Labels:
      None

      Description

      In BIGTOP-1648 we upgraded and now there are some updates we need to do for puppet recipes . This is a critical blocker for release, as we want to have puppet recipes as first-class citizens reflecting the correct deployment of anything in bigtop.

        Issue Links

          Activity

          Hide
          warwithin YoungWoo Kim added a comment -

          From BIGTOP-1648, Apache Spark on Bigtop updated version to 1.2.1. This is notes regarding installation of the spark-history-server and spark-thriftserver.

          0) Install Spark history server

          # yum install -y spark-history-server
          

          1) To run spark-history-server, make sure you've created a event log directory on HDFS:

          su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /var/log/spark/apps'
          su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chmod -R 1777 /var/log/spark/apps'
          su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chown spark:spark /var/log/spark/apps'
          

          2) Create '/etc/spark/spark-default.conf':

          cd /etc/spark/conf
          cp spark-default.conf.template spark-default.conf
          

          Edit the spark-default.conf:

          spark.master                     spark://SPARK-MASTER-HOSTNAME:7077
          spark.eventLog.enabled           true
          spark.eventLog.dir               hdfs://HDFS-NN-HOSTNAME:8020/var/log/spark/apps/
          

          3) Run spark examples with yarn:

          # service spark-history-server start
          export HADOOP_CONF_DIR=/etc/hadoop/conf
          
          spark-submit --class org.apache.spark.examples.SparkPi --deploy-mode client --master yarn /usr/lib/spark/lib/spark-examples_2.10-1.2.0.jar 2
          

          4) Browse the spark history server:
          http://HOSTNAME:18082/


          Installation of spark-thriftserver (Optional)

          if you want to use Spark SQL' JDBC access through hive thrift server, you have to run spark-thriftserver

          1) Edit hive-site.xml for Spark SQL

          vi /etc/spark/conf/hive-site.xml
          

          And make sure following properties have setted up properly. (in my case, I use MySQL for database)

          
          <property>
            <name>javax.jdo.option.ConnectionURL</name>
            <value>jdbc:mysql://localhost/hive?createDatabaseIfNotExist=true</value>
            <description>JDBC connect string for a JDBC metastore</description>
          </property>
           
          <property>
            <name>javax.jdo.option.ConnectionDriverName</name>
            <value>com.mysql.jdbc.Driver</value>
            <description>Driver class name for a JDBC metastore</description>
          </property>
           
          <property>
            <name>javax.jdo.option.ConnectionUserName</name>
            <value>username</value>
            <description>username to use against metastore database</description>
          </property>
           
          <property>
            <name>javax.jdo.option.ConnectionPassword</name>
            <value>mypassword</value>
            <description>password to use against metastore database</description>
          </property>
          
          

          and then, run spark-thriftserver service(default port is 10000):

          # service spark-thriftserver start
          

          Using beeline client, you can access JDBC connection for spark thriftserver:

          $SPARK_HOME/bin/beeline -u jdbc:hive2://THRIFT-HOSTNAME:10000
          
          Show
          warwithin YoungWoo Kim added a comment - From BIGTOP-1648 , Apache Spark on Bigtop updated version to 1.2.1. This is notes regarding installation of the spark-history-server and spark-thriftserver. 0) Install Spark history server # yum install -y spark-history-server 1) To run spark-history-server, make sure you've created a event log directory on HDFS: su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -mkdir -p /var/log/spark/apps' su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chmod -R 1777 /var/log/spark/apps' su -s /bin/bash hdfs -c '/usr/bin/hadoop fs -chown spark:spark /var/log/spark/apps' 2) Create '/etc/spark/spark-default.conf': cd /etc/spark/conf cp spark-default.conf.template spark-default.conf Edit the spark-default.conf: spark.master spark: //SPARK-MASTER-HOSTNAME:7077 spark.eventLog.enabled true spark.eventLog.dir hdfs: //HDFS-NN-HOSTNAME:8020/ var /log/spark/apps/ 3) Run spark examples with yarn: # service spark-history-server start export HADOOP_CONF_DIR=/etc/hadoop/conf spark-submit --class org.apache.spark.examples.SparkPi --deploy-mode client --master yarn /usr/lib/spark/lib/spark-examples_2.10-1.2.0.jar 2 4) Browse the spark history server: http://HOSTNAME:18082/ Installation of spark-thriftserver (Optional) if you want to use Spark SQL' JDBC access through hive thrift server, you have to run spark-thriftserver 1) Edit hive-site.xml for Spark SQL vi /etc/spark/conf/hive-site.xml And make sure following properties have setted up properly. (in my case, I use MySQL for database) <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql: //localhost/hive?createDatabaseIfNotExist= true </value> <description>JDBC connect string for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> <description>Driver class name for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>username</value> <description>username to use against metastore database</description> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>mypassword</value> <description>password to use against metastore database</description> </property> and then, run spark-thriftserver service(default port is 10000): # service spark-thriftserver start Using beeline client, you can access JDBC connection for spark thriftserver: $SPARK_HOME/bin/beeline -u jdbc:hive2://THRIFT-HOSTNAME:10000
          Hide
          cos Konstantin Boudnik added a comment -

          Anyone from Spark folks is working on this?

          Show
          cos Konstantin Boudnik added a comment - Anyone from Spark folks is working on this?
          Hide
          cos Konstantin Boudnik added a comment -

          Moving to 1.1.0 for two reasons: we are packing Spark 1.3 right now, so I am not sure how relevant this ticket is. Also, it doesn't look like a blocker, hence I don't want to hold the releases for it. Please let me know if I am missing something and I will get it back in.

          Show
          cos Konstantin Boudnik added a comment - Moving to 1.1.0 for two reasons: we are packing Spark 1.3 right now, so I am not sure how relevant this ticket is. Also, it doesn't look like a blocker, hence I don't want to hold the releases for it. Please let me know if I am missing something and I will get it back in.
          Hide
          jayunit100 jay vyas added a comment -

          makes sense. renaming as update puppet recipes for spark

          Show
          jayunit100 jay vyas added a comment - makes sense. renaming as update puppet recipes for spark

            People

            • Assignee:
              Unassigned
              Reporter:
              jayunit100 jay vyas
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development