Details
-
Bug
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
0.5.0
-
None
-
None
-
Failed Environment:
-------------------
[EMR 6.0.0]
Hadoop 3.2.1
Hive 3.1.2 (aws glue metastore)
Spark 2.4.4
Scala version 2.12.10
Apache Maven 3.5.2
Java version: 1.8.0_242, vendor: Amazon.com Inc
OS name: "linux", version: "4.14.165-133.209.amzn2.x86_64", arch: "amd64", family: "unix"
griffin-0.5.0 (mvn --projects measure --also-make clean install -Dmaven.test.skip=true)
Success Environment:
---------------------
[EMR 5.30.1]
Hadoop 2.8.5
Hive 2.3.6 (aws glue metastore)
Spark 2.4.5
Scala version 2.11.12
Apache Maven 3.5.2
Java version: 1.8.0_252, vendor: Amazon.com Inc.
OS name: "linux", version: "4.14.173-137.229.amzn2.x86_64", arch: "amd64", family: "unix"
griffin-0.5.0 (mvn --projects measure --also-make clean install -Dmaven.test.skip=true)Failed Environment: ------------------- [EMR 6.0.0] Hadoop 3.2.1 Hive 3.1.2 (aws glue metastore) Spark 2.4.4 Scala version 2.12.10 Apache Maven 3.5.2 Java version: 1.8.0_242, vendor: Amazon.com Inc OS name: "linux", version: "4.14.165-133.209.amzn2.x86_64", arch: "amd64", family: "unix" griffin-0.5.0 (mvn --projects measure --also-make clean install -Dmaven.test.skip=true) Success Environment: --------------------- [EMR 5.30.1] Hadoop 2.8.5 Hive 2.3.6 (aws glue metastore) Spark 2.4.5 Scala version 2.11.12 Apache Maven 3.5.2 Java version: 1.8.0_252, vendor: Amazon.com Inc. OS name: "linux", version: "4.14.173-137.229.amzn2.x86_64", arch: "amd64", family: "unix" griffin-0.5.0 (mvn --projects measure --also-make clean install -Dmaven.test.skip=true)
Description
Issue:
Built Griffin 0.5.0 without UI & Unit tests. [mvn --projects measure --also-make clean install -Dmaven.test.skip=true]
It failed to run sample code (similar to quick start code) on EMR 6.0.0 but the same code was successful on EMR 5.30.1.
Questions:
1. Is it either issue with hadoop/hive version of scala version ?
2. If versions are the issue in that what is the plan for future support?
Failed Case:
Using below environment.
[EMR 6.0.0] Hadoop 3.2.1 Hive 3.1.2 (aws glue metastore) Spark 2.4.4 Scala version 2.12.10 Apache Maven 3.5.2 Java version: 1.8.0_242, vendor: Amazon.com Inc OS name: "linux", version: "4.14.165-133.209.amzn2.x86_64", arch: "amd64", family: "unix" griffin-0.5.0 (mvn --projects measure --also-make clean install -Dmaven.test.skip=true)
Command:
spark-submit --class org.apache.griffin.measure.Application --master yarn --deploy-mode client \ --queue default --driver-memory 1g --executor-memory 1g --num-executors 2 \ /home/hadoop/griffin/griffin-0.5.0/measure/target/measure-0.5.0.jar /home/hadoop/env.json /home/hadoop/dq.json
Error:
20/06/30 05:56:51 INFO Application$: [Ljava.lang.String;@3561c410 20/06/30 05:56:51 INFO Application$: /home/hadoop/env.json 20/06/30 05:56:51 INFO Application$: /home/hadoop/dq.json Exception in thread "main" java.lang.NoClassDefFoundError: scala/Product$class at org.apache.griffin.measure.configuration.dqdefinition.reader.ParamFileReader.<init>(ParamFileReader.scala:36) at org.apache.griffin.measure.configuration.dqdefinition.reader.ParamReaderFactory$.getParamReader(ParamReaderFactory.scala:36) at org.apache.griffin.measure.Application$.readParamFile(Application.scala:122) at org.apache.griffin.measure.Application$.main(Application.scala:51) at org.apache.griffin.measure.Application.main(Application.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:853) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:928) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:937) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: scala.Product$class at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:419) at java.lang.ClassLoader.loadClass(ClassLoader.java:352) ... 17 more 20/06/30 05:56:51 INFO ShutdownHookManager: Shutdown hook called 20/06/30 05:56:51 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-20c5087b-8d2b-4358-92aa-7a398f7078ef
**
Success Case:
However it is successful using below environment (older EMR):
[EMR 5.30.1] Hadoop 2.8.5 Hive 2.3.6 (aws glue metastore) Spark 2.4.5 Scala version 2.11.12 Apache Maven 3.5.2 Java version: 1.8.0_252, vendor: Amazon.com Inc. OS name: "linux", version: "4.14.173-137.229.amzn2.x86_64", arch: "amd64", family: "unix" griffin-0.5.0 (mvn --projects measure --also-make clean install -Dmaven.test.skip=true)
Command:
spark-submit --class org.apache.griffin.measure.Application --master yarn --deploy-mode client \ --queue default --driver-memory 1g --executor-memory 1g --num-executors 2 \ /home/hadoop/griffin/griffin-0.5.0/measure/target/measure-0.5.0.jar /home/hadoop/env.json /home/hadoop/dq.json
Output:
data source timeRanges: src -> (1593499903348, 1593499903348], tgt -> (1593499903348, 1593499903348] [1593499903348] batch_accu start: application_1593497650758_0002 batch_accu [1593499903348] metrics: {"name":"batch_accu","tmst":1593499903348,"value":{"total_count":10,"miss_count":3,"matched_count":7,"matchedFraction":0.7},"applicationId":"applicati on_1593497650758_0002"} [1593499903348] 1593499990953: process using time: 87605 ms [1593499903348] batch_accu finish