Uploaded image for project: 'Zeppelin'
  1. Zeppelin
  2. ZEPPELIN-1728

Assigning HiveContext(sc) to a variable 2nd time gives errors

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Information Provided
    • 0.6.2, 0.7.0
    • 0.7.0
    • Core, pySpark, zeppelin-server
    • None
    • Spark 1.6 that comes with CDH 5.8.3.
      Zeppelin 0.6.2, downloaded as zeppelin-0.6.2-bin-all.tgz from from apache.org

    Description

      Assigning HiveContext(sc) to a variable 2nd time gives "You must build Spark with Hive. Export 'SPARK_HIVE=true'"

      It's only fixable by restarting Zeppelin.

      Getting
      You must build Spark with Hive. Export 'SPARK_HIVE=true'
      See full stack (2) below.

      I'm using Spark 1.6 that comes with CDH 5.8.3.
      So it's definitely compiled with Hive.
      We use Jupyter notebooks without problems in the same environment.

      Using Zeppelin 0.6.2, downloaded as zeppelin-0.6.2-bin-all.tgz from from apache.org

      Is Zeppelin compiled with Hive too? I guess so.
      Not sure what else is missing.

      Tried to play with ZEPPELIN_SPARK_USEHIVECONTEXT but it does not make difference.

      (1)

      $ cat zeppelin-env.sh
      export JAVA_HOME=/usr/java/java7
      export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark
      export SPARK_SUBMIT_OPTIONS="--principal xxxx --keytab yyy --conf spark.driver.memory=7g --conf spark.executor.cores=2 --conf spark.executor.memory=8g"
      export SPARK_APP_NAME="Zeppelin notebook"
      export HADOOP_CONF_DIR=/etc/hadoop/conf
      export HIVE_CONF_DIR=/etc/hive/conf
      export HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hive
      export PYSPARK_PYTHON="/opt/cloudera/parcels/Anaconda/bin/python2"
      export PYTHONPATH="/opt/cloudera/parcels/CDH/lib/spark/python:/opt/cloudera/parcels/CDH/lib/spark/python/lib/py4j-0.9-src.zip"
      export MASTER="yarn-client"
      export ZEPPELIN_SPARK_USEHIVECONTEXT=true
      

      (2)

      You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt assembly
      Traceback (most recent call last):
        File "/tmp/zeppelin_pyspark-9143637669637506477.py", line 267, in <module>
          raise Exception(traceback.format_exc())
      Exception: Traceback (most recent call last):
        File "/tmp/zeppelin_pyspark-9143637669637506477.py", line 265, in <module>
          exec(code)
        File "<stdin>", line 9, in <module>
        File "/opt/cloudera/parcels/CDH/lib/spark/python/pyspark/sql/context.py", line 580, in sql
      

      (3)

      Also have correct symlinks in zeppelin_home/conf for
      - hive-site.xml
      - hdfs-site.xml
      - core-site.xml
      - yarn-site.xml
      

      Attachments

        Issue Links

          Activity

            People

              moon Lee Moon Soo
              Tagar Ruslan Dautkhanov
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: