Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-29088

Hive 2.3.6 / HDP 2.7.7 / spark 2.4.4 lz4-java.jar, insert fail with MR spark engine mode , work fine with hadoop mode

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.4.4
    • Fix Version/s: None
    • Component/s: Deploy
    • Labels:
      None
    • Environment:

      linux ubuntu 18.04 standalone

      hive 2.3.6

      mysql 5.7.27

      hadoop 2.7.7

      spark 2.4.4 

      lz4-java.jar dependencies added in hive/lib and spark/jars

      spark/jars added on hdfs  spark-jars/

      Description

      hello,  

      i install  hadoop 2.7.7 work fine 

      i install hive 2.3.6, work fine with hadoop 2.7.7, the   lz-1.3.0.jar was replaced by lz-java-1.4.0 jar from spark/jars because risk conflict class loader, the version 1.4.0  look compatible with old methods  and not disturbed by new features

      hive is configured with mysql 5.7.27

      i install spark 2.4.4  

      i configure hive-site.xml of hive/conf  with spark engine and i copy then to spark/conf

      <property>
      <name>hive.execution.engine</name>
      <value>spark</value>
      <description>Use Map Reduce as default execution engine</description>
      </property>
      <property>
      <name>spark.master</name>
      <value>spark://192.168.0.30:7077</value>
      </property>
      <property>
      <name>spark.eventLog.enabled</name>
      <value>true</value>
      </property>
      <property>
      <name>spark.eventLog.dir</name>
      <value>/tmp</value>
      </property>
      <property>
      <name>spark.serializer</name>
      <value>org.apache.spark.serializer.KryoSerializer</value>
      </property>
      <property>
      <name>spark.yarn.jars</name>
      <value>hdfs://192.168.0.30:54310/spark-jars/*</value>
      </property>
      <property>
      <name>system:java.io.tmpdir</name>
      <value>/tmp/hive/java</value>
      </property>
      <property>
      <name>system:user.name</name>
      <value>${user.name}</value>
      </property>
      </configuration>
      ~

      when i start hive with spark engine (hive work fine in context hadoop)

      i can use show table

      i can use query select * from employee ;

      lwork fine

      but when i use insert 

      i go fail, 

      Job failed with java.lang.NoSuchMethodError: net.jpountz.lz4.LZ4BlockInputStream.<init>(Ljava/io/InputStream;Z)VFAILED: 

      i have lz4-java-1.4.0.jar in spark/jars and i replace the lz-1.3.0.jar in hive/lib

      i have no more lz-1.3.0.jar, but it can't find the new method of lZ4-java-1.4.0( (Ljava/io/InputStream;Z) in the spark worker

      i remove all jar 1.2.1 and i replace them by all jar 2.3.6 from hive into spark/jars

      i add all jars spark-2.4.4/jars/*  to hadoop 2.7.7 hdsf /spark-jars/

      the worker driver log use  the jar hive-exec-2.3.6.jar

      i forget something todo ? it dont see  where is the proble . the lz4-java-1.4.0 jar is present and the method called exist in lz4-java-1.4.0, i have no more lz-1.3.0.jar, i have no conflict in configuration hadoop+hive mode, with using dependency lz4-java-1.4.0

      Thanks for your remarks, because i have no more idea where found solution. that seem fail in the map worker of spark engine, i must add somewhere je the jars lz4-java in some extra classpath somewhere ?.

      some stacks ? also present into logs

       
      SLF4J: Class path contains multiple SLF4J bindings.SLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [jar:file:/usr/lib/hive/apache-hive-2.3.6-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
      Logging initialized using configuration in file:/usr/lib/hive/apache-hive-2.3.6-bin/conf/hive-log4j2.properties Async: truehive> select * from employee    > ;OK1 Allen IT2 Mag Sales3 Rob Sales4 Dana IT6 Jean-Pierre Bordenave7 Pierre xXx11 Pierre xXxTime taken: 2.99 seconds, Fetched: 7 row(s)hive> insert into employee values("10","Pierre","xXx");Query ID = spark_20190915110359_e62a4e1a-fd69-4f17-a0f1-20513f291ddcTotal jobs = 1Launching Job 1 out of 1In order to change the average load for a reducer (in bytes):  set hive.exec.reducers.bytes.per.reducer=<number>In order to limit the maximum number of reducers:  set hive.exec.reducers.max=<number>In order to set a constant number of reducers:  set mapreduce.job.reduces=<number>Starting Spark Job = 6b9db937-53d2-4d45-84b2-8e5c6427d9d3
      Query Hive on Spark job[0] stages: [0]
      Status: Running (Hive on Spark job[0])--------------------------------------------------------------------------------------          STAGES   ATTEMPT        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  --------------------------------------------------------------------------------------Stage-0                  0       RUNNING      1          0        0        1       1  --------------------------------------------------------------------------------------STAGES: 00/01    [>>--------------------------] 0%    ELAPSED TIME: 3,02 s     --------------------------------------------------------------------------------------Job failed with java.lang.NoSuchMethodError: net.jpountz.lz4.LZ4BlockInputStream.<init>(Ljava/io/InputStream;Z)VFAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. java.util.concurrent.ExecutionException: Exception thrown by job at org.apache.spark.JavaFutureActionWrapper.getImpl(FutureAction.scala:337) at org.apache.spark.JavaFutureActionWrapper.get(FutureAction.scala:342) at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:362) at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:323) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, 192.168.0.30, executor 2): java.lang.NoSuchMethodError: net.jpountz.lz4.LZ4BlockInputStream.<init>(Ljava/io/InputStream;Z)V at org.apache.spark.io.LZ4CompressionCodec.compressedInputStream(CompressionCodec.scala:122) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$6.apply(TorrentBroadcast.scala:304) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$6.apply(TorrentBroadcast.scala:304) at scala.Option.map(Option.scala:146) at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:304) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$apply$2.apply(TorrentBroadcast.scala:235) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:211) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326) at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207) at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66) at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66) at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96) at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:84) at org.apache.spark.scheduler.Task.run(Task.scala:123) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
      Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1889) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1877) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1876) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1876) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926) at scala.Option.foreach(Option.scala:257) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2110) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)Caused by: java.lang.NoSuchMethodError: net.jpountz.lz4.LZ4BlockInputStream.<init>(Ljava/io/InputStream;Z)V at org.apache.spark.io.LZ4CompressionCodec.compressedInputStream(CompressionCodec.scala:122) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$6.apply(TorrentBroadcast.scala:304) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$6.apply(TorrentBroadcast.scala:304) at scala.Option.map(Option.scala:146) at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:304) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$apply$2.apply(TorrentBroadcast.scala:235) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:211) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1326) at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207) at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66) at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66) at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96) at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:84) at org.apache.spark.scheduler.Task.run(Task.scala:123) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
      
      

       do you have sample configuration spark 2.4.4 with hive 2.3.6 somewhere ? lot of tutorials are not  more up to date, thank a lot.

      JP

       

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              jpbordi JP Bordenave
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: