Uploaded image for project: 'Bigtop'
  1. Bigtop
  2. BIGTOP-3641

Hive on Spark error

Agile BoardAttach filesAttach ScreenshotAdd voteVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.0.0, 3.1.0
    • None
    • hive, spark
    • None

    Description

      Hi! I've tried to launch Hadoop stack in docker in 2 ways:

      1. successfully build hdfs, yarn, mapreduce, hbase, hive, spark, zookeeper from bigtop master branch (3.1.0 version) and launched docker from local repo via provisioner with all this components
      2. same as 1st approach but with bigtop repo (3.0.0 version)

      In both cases everything works fine, but Hive on Spark fails with an error:

      hive> set hive.execution.engine=spark;
      hive> select id, count(*) from default.test group by id;
      Query ID = root_20220209133134_cf3aec7d-ee2e-4d38-b200-6d616020d4b6
      Total jobs = 1
      Launching Job 1 out of 1
      In order to change the average load for a reducer (in bytes):
        set hive.exec.reducers.bytes.per.reducer=<number>
      In order to limit the maximum number of reducers:
        set hive.exec.reducers.max=<number>
      In order to set a constant number of reducers:
        set mapreduce.job.reduces=<number>
      Job failed with java.lang.ClassNotFoundException: oot_20220209133134_cf3aec7d-ee2e-4d38-b200-6d616020d4b6:1
      FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Spark job failed during runtime. Please check stacktrace for the root cause.

       

      From spark-shell everything works fine:

      scala> sql("select id, count(*) from default.test group by id").show()
      +---+--------+                                                                  
      | id|count(1)|
      +---+--------+
      |  1|       1|
      |  2|       1|
      +---+--------+

       

      I've also tried to create an hdfs dir with spark libs and specify config was done in https://issues.apache.org/jira/browse/BIGTOP-3333 - it didn't help. Any ideas what is missing and how to fix it?

      P.S. Spark is used as spark-on-yarn

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            affei Andrew

            Dates

              Created:
              Updated:

              Slack

                Issue deployment