Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
v2.5.0
-
None
-
test
-
Important
Description
HI I am having error at step #7 when building with Spark on AWS EMR, with mapreduce the same builds fine . The error I am getting is as below :-
18/11/12 21:45:01 INFO yarn.Client: client token: N/A diagnostics: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.storage.hbase.steps.SparkCubeHFile. Root cause: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 8, ip-10-81-117-151.wfg1tst.cltest.wellmanage.com, executor 1): java.lang.NoClassDefFoundError: Could not initialize class org.apache.kylin.metadata.datatype.DataType at java.io.ObjectStreamClass.hasStaticInitializer(Native Method) at java.io.ObjectStreamClass.computeDefaultSUID(ObjectStreamClass.java:1787) at java.io.ObjectStreamClass.access$100(ObjectStreamClass.java:72) at java.io.ObjectStreamClass$1.run(ObjectStreamClass.java:253) at java.io.ObjectStreamClass$1.run(ObjectStreamClass.java:251) at java.security.AccessController.doPrivileged(Native Method) at java.io.ObjectStreamClass.getSerialVersionUID(ObjectStreamClass.java:250) at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:611) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1843) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1713) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2000) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535) at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1933) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1529) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422) at java.util.HashSet.readObject(HashSet.java:333) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2136) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:85) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:325) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Driver stacktrace: ApplicationMaster host: 10.81.117.160 ApplicationMaster RPC port: 0 queue: default start time: 1542059005981 final status: FAILED tracking URL: http://ip-10-81-117-170.wfg1tst.cltest.wellmanage.com:20888/proxy/application_1541998641232_0122/ user: hadoop Exception in thread "main" org.apache.spark.SparkException: Application application_1541998641232_0122 finished with failed status at org.apache.spark.deploy.yarn.Client.run(Client.scala:1180) at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1226) at org.apache.spark.deploy.yarn.Client.main(Client.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:744) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 18/11/12 21:45:01 INFO util.ShutdownHookManager: Shutdown hook called 18/11/12 21:45:01 INFO util.ShutdownHookManager: Deleting directory /mnt/tmp/spark-b4ba04ad-6cd7-4411-a48c-b1faada49837 The command is: export HADOOP_CONF_DIR=/etc/hadoop/conf && /usr/local/kylin/spark/bin/spark-submit --class org.apache.kylin.common.util.SparkEntry --conf spark.executor.cores=1 --conf spark.hadoop.yarn.timeline-service.enabled=false --conf spark.hadoop.mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.DefaultCodec --conf spark.executor.extraJavaOptions=-Dhdp.version=current --conf spark.master=yarn --conf spark.hadoop.mapreduce.output.fileoutputformat.compress=true --conf spark.executor.instances=40 --conf spark.yarn.am.extraJavaOptions=-Dhdp.version=current --conf spark.executor.memory=1G --conf spark.yarn.queue=default --conf spark.submit.deployMode=cluster --conf spark.dynamicAllocation.minExecutors=1 --conf spark.network.timeout=600 --conf spark.hadoop.dfs.replication=2 --conf spark.yarn.executor.memoryOverhead=1024 --conf spark.dynamicAllocation.executorIdleTimeout=300 --conf spark.history.fs.logDirectory=hdfs:///kylin/spark-history --conf spark.driver.memory=2G --conf spark.driver.extraJavaOptions=-Dhdp.version=current --conf spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec --conf spark.eventLog.enabled=true --conf spark.shuffle.service.enabled=true --conf spark.eventLog.dir=hdfs:///kylin/spark-history --conf spark.yarn.archive=hdfs://ip-10-81-117-170.wfg1tst.cltest.wellmanage.com:8020/kylin/spark/spark-libs.jar --conf spark.dynamicAllocation.maxExecutors=1000 --conf spark.dynamicAllocation.enabled=true --jars /usr/lib/hbase/lib/hbase-common-1.3.1.jar,/usr/lib/hbase/lib/hbase-server-1.3.1.jar,/usr/lib/hbase/lib/hbase-client-1.3.1.jar,/usr/lib/hbase/lib/hbase-protocol-1.3.1.jar,/usr/lib/hbase/lib/hbase-hadoop-compat-1.3.1.jar,/usr/lib/hbase/lib/htrace-core-3.1.0-incubating.jar,/usr/lib/hbase/lib/metrics-core-2.2.0.jar, /usr/local/kylin/lib/kylin-job-2.5.0.jar -className org.apache.kylin.storage.hbase.steps.SparkCubeHFile -partitions s3://wfg1tst-models/kylin/kylin_metadata/kylin-e722447f-33ff-0ff7-7440-3dc884e8f6a7/kylin_sales_cube/rowkey_stats/part-r-00000_hfile -counterOutput s3://wfg1tst-models/kylin/kylin_metadata/kylin-e722447f-33ff-0ff7-7440-3dc884e8f6a7/kylin_sales_cube/counter -cubename kylin_sales_cube -output s3://wfg1tst-models/kylin/kylin_metadata/kylin-e722447f-33ff-0ff7-7440-3dc884e8f6a7/kylin_sales_cube/hfile -input s3://wfg1tst-models/kylin/kylin_metadata/kylin-e722447f-33ff-0ff7-7440-3dc884e8f6a7/kylin_sales_cube/cuboid/ -segmentId ae2b1e37-42dd-2b48-b29e-d152c915281f -metaUrl kylin_metadata@hdfs,path=s3://wfg1tst-models/kylin/kylin_metadata/kylin-e722447f-33ff-0ff7-7440-3dc884e8f6a7/kylin_sales_cube/metadata -hbaseConfPath s3://wfg1tst-models/kylin/kylin_metadata/kylin-e722447f-33ff-0ff7-7440-3dc884e8f6a7/hbase-conf.xml
Emr version emr-5.8.2
hadoop version : Hadoop 2.7.3-amzn-3
Spark : Using kylin spark as per configuration notes here http://kylin.apache.org/docs/install/kylin_aws_emr.html