Uploaded image for project: 'Kylin'
  1. Kylin
  2. KYLIN-4522

Could not initialize class org.apache.hadoop.hbase.io.hfile.HFile Kylin 2.6.6 EMR 5.19

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: v2.6.6
    • Fix Version/s: None
    • Component/s: Environment , Job Engine, Others
    • Labels:
      None
    • Environment:
      Release label: emr-5.19.0

      Hadoop distribution:Amazon 2.8.5

      Applications: Hive 2.3.3, HBase 1.4.7, Spark 2.3.2, Livy 0.5.0, ZooKeeper 3.4.13, Sqoop 1.4.7, Oozie 5.0.0, Pig 0.17.0, HCatalog 2.3.3

      Description

      Hi,

      I've tried to build the Sample kylin_sales_cube with Spark to run in Amazon EMR Cluster. I saw issue KYLIN-3931 and suggestion is to use the 2.6.6 Engine for Hadoop 3. In EMR Hadoop 3 is only available on EMR 6.0 which is very recent and I had tried to setup versions 2.6.6 and 3.0.2 for Hadoop 3, but in both cases the Kylin Site doesn't show up (Error 404 - Not Found). So I tried to run in EMR 5.19 that has same version of Spark (2.3.2) used in Kylin 2.6.6.

      I am getting "java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hbase.io.hfile.HFile" error message. 

      I had already copied the following jars to Spark Jars folder, as per documentations and what I've read in kylin-issues mailing list archives:

      /usr/lib/hbase/hbase-hadoop-compat-1.4.7.jar
      /usr/lib/hbase/hbase-hadoop2-compat-1.4.7.jar
      /usr/lib/hbase/lib/hbase-common-1.4.7-tests.jar
      /usr/lib/hbase/lib/hbase-common-1.4.7.jar
      /usr/lib/hbase/hbase-client.jar
      /usr/lib/hbase/hbase-client-1.4.7.jar
      /usr/lib/hbase/hbase-server-1.4.7.jar

       

      This is the output shown on the Step

      org.apache.kylin.engine.spark.exception.SparkException: OS command error exit with return code: 1, error message: 20/05/25 14:03:46 WARN SparkConf: The configuration key 'spark.yarn.executor.memoryOverhead' has been deprecated as of Spark 2.3 and may be removed in the future. Please use the new key 'spark.executor.memoryOverhead' instead.org.apache.kylin.engine.spark.exception.SparkException: OS command error exit with return code: 1, error message: 20/05/25 14:03:46 WARN SparkConf: The configuration key 'spark.yarn.executor.memoryOverhead' has been deprecated as of Spark 2.3 and may be removed in the future. Please use the new key 'spark.executor.memoryOverhead' instead.20/05/25 14:03:47 INFO RMProxy: Connecting to ResourceManager at ip-XXX-XXX-XXX-XXX.us-west-2.compute.internal/XXX.XXX.XXX.XXX:803220/05/25 14:03:49 INFO Client: Requesting a new application from cluster with 4 NodeManagers20/05/25 14:03:49 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (6144 MB per container)20/05/25 14:03:49 INFO Client: Will allocate AM container, with 5632 MB memory including 512 MB overhead20/05/25 14:03:49 INFO Client: Setting up container launch context for our AM20/05/25 14:03:49 INFO Client: Setting up the launch environment for our AM container20/05/25 14:03:49 INFO Client: Preparing resources for our AM container20/05/25 14:03:51 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.20/05/25 14:03:54 INFO Client: Uploading resource file:/mnt/tmp/spark-d26c4f1f-1b8a-4cf8-a05b-842294ce017d/__spark_libs__4034657074333893156.zip -> hdfs://ip-XXX-XXX-XXX-XXX.us-west-2.compute.internal:8020/user/hadoop/.sparkStaging/application_1590337422418_0043/_spark_libs4034657074333893156.zip20/05/25 14:03:54 INFO Client: Uploading resource file:/usr/local/kylin/apache-kylin-2.6.6-bin-hbase1x/lib/kylin-job-2.6.6.jar -> hdfs://ip-XXX-XXX-XXX-XXX.us-west-2.compute.internal:8020/user/hadoop/.sparkStaging/application_1590337422418_0043/kylin-job-2.6.6.jar20/05/25 14:03:55 INFO Client: Uploading resource file:/usr/lib/hbase/lib/hbase-common-1.4.7.jar -> hdfs://ip-XXX-XXX-XXX-XXX.us-west-2.compute.internal:8020/user/hadoop/.sparkStaging/application_1590337422418_0043/hbase-common-1.4.7.jar20/05/25 14:03:55 INFO Client: Uploading resource file:/usr/lib/hbase/lib/hbase-server-1.4.7.jar -> hdfs://ip-XXX-XXX-XXX-XXX.us-west-2.compute.internal:8020/user/hadoop/.sparkStaging/application_1590337422418_0043/hbase-server-1.4.7.jar20/05/25 14:03:55 INFO Client: Uploading resource file:/usr/lib/hbase/lib/hbase-client-1.4.7.jar -> hdfs://ip-XXX-XXX-XXX-XXX.us-west-2.compute.internal:8020/user/hadoop/.sparkStaging/application_1590337422418_0043/hbase-client-1.4.7.jar20/05/25 14:03:55 INFO Client: Uploading resource file:/usr/lib/hbase/lib/hbase-protocol-1.4.7.jar -> hdfs://ip-XXX-XXX-XXX-XXX.us-west-2.compute.internal:8020/user/hadoop/.sparkStaging/application_1590337422418_0043/hbase-protocol-1.4.7.jar20/05/25 14:03:55 INFO Client: Uploading resource file:/usr/lib/hbase/lib/hbase-hadoop-compat-1.4.7.jar -> hdfs://ip-XXX-XXX-XXX-XXX.us-west-2.compute.internal:8020/user/hadoop/.sparkStaging/application_1590337422418_0043/hbase-hadoop-compat-1.4.7.jar20/05/25 14:03:56 INFO Client: Uploading resource file:/usr/lib/hbase/lib/htrace-core-3.1.0-incubating.jar -> hdfs://ip-XXX-XXX-XXX-XXX.us-west-2.compute.internal:8020/user/hadoop/.sparkStaging/application_1590337422418_0043/htrace-core-3.1.0-incubating.jar20/05/25 14:03:56 INFO Client: Uploading resource file:/usr/lib/hbase/lib/metrics-core-2.2.0.jar -> hdfs://ip-XXX-XXX-XXX-XXX.us-west-2.compute.internal:8020/user/hadoop/.sparkStaging/application_1590337422418_0043/metrics-core-2.2.0.jar20/05/25 14:03:56 WARN Client: Same path resource file:///usr/lib/hbase/lib/hbase-hadoop-compat-1.4.7.jar added multiple times to distributed cache.20/05/25 14:03:56 INFO Client: Uploading resource file:/usr/lib/hbase/lib/hbase-hadoop2-compat-1.4.7.jar -> hdfs://ip-XXX-XXX-XXX-XXX.us-west-2.compute.internal:8020/user/hadoop/.sparkStaging/application_1590337422418_0043/hbase-hadoop2-compat-1.4.7.jar20/05/25 14:03:56 INFO Client: Uploading resource file:/etc/spark/conf/hive-site.xml -> hdfs://ip-XXX-XXX-XXX-XXX.us-west-2.compute.internal:8020/user/hadoop/.sparkStaging/application_1590337422418_0043/hive-site.xml20/05/25 14:03:56 INFO Client: Uploading resource file:/mnt/tmp/spark-d26c4f1f-1b8a-4cf8-a05b-842294ce017d/__spark_conf__1997289269037988671.zip -> hdfs://ip-XXX-XXX-XXX-XXX.us-west-2.compute.internal:8020/user/hadoop/.sparkStaging/application_1590337422418_0043/spark_conf_.zip20/05/25 14:03:56 INFO SecurityManager: Changing view acls to: hadoop20/05/25 14:03:56 INFO SecurityManager: Changing modify acls to: hadoop20/05/25 14:03:56 INFO SecurityManager: Changing view acls groups to: 20/05/25 14:03:56 INFO SecurityManager: Changing modify acls groups to: 20/05/25 14:03:56 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(hadoop); groups with view permissions: Set(); users  with modify permissions: Set(hadoop); groups with modify permissions: Set()20/05/25 14:03:56 INFO Client: Submitting application application_1590337422418_0043 to ResourceManager20/05/25 14:03:56 INFO YarnClientImpl: Submitted application application_1590337422418_004320/05/25 14:03:57 INFO Client: Application report for application_1590337422418_0043 (state: ACCEPTED)20/05/25 14:03:57 INFO Client:  client token: N/A diagnostics: AM container is launched, waiting for AM container to Register with RM ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1590415436952 final status: UNDEFINED tracking URL: http://ip-XXX-XXX-XXX-XXX.us-west-2.compute.internal:20888/proxy/application_1590337422418_0043/ user: hadoop20/05/25 14:03:58 INFO Client: Application report for application_1590337422418_0043 (state: ACCEPTED)20/05/25 14:03:59 INFO Client: Application report for application_1590337422418_0043 (state: ACCEPTED)20/05/25 14:04:00 INFO Client: Application report for application_1590337422418_0043 (state: ACCEPTED)20/05/25 14:04:01 INFO Client: Application report for application_1590337422418_0043 (state: ACCEPTED)20/05/25 14:04:02 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:04:02 INFO Client:  client token: N/A diagnostics: N/A ApplicationMaster host: XXX.XXX.XXX.XXX ApplicationMaster RPC port: 0 queue: default start time: 1590415436952 final status: UNDEFINED tracking URL: http://ip-XXX-XXX-XXX-XXX.us-west-2.compute.internal:20888/proxy/application_1590337422418_0043/ user: hadoop20/05/25 14:04:03 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:04:04 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:04:05 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:04:06 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:04:07 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:04:08 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:04:09 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:04:10 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:04:11 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:04:12 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:04:13 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:04:14 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:04:15 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:04:16 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:04:17 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:04:18 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:04:19 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:04:21 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:04:22 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:04:23 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:04:24 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:04:25 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:04:26 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:04:27 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:04:28 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:04:29 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:04:30 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:04:31 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:04:32 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:04:33 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:04:34 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:04:35 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:04:36 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:04:37 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:04:38 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:04:39 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:04:40 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:04:41 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:04:42 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:04:43 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:04:44 INFO Client: Application report for application_1590337422418_0043 (state: ACCEPTED)20/05/25 14:04:44 INFO Client:  client token: N/A diagnostics: AM container is launched, waiting for AM container to Register with RM ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1590415436952 final status: UNDEFINED tracking URL: http://ip-XXX-XXX-XXX-XXX.us-west-2.compute.internal:20888/proxy/application_1590337422418_0043/ user: hadoop20/05/25 14:04:45 INFO Client: Application report for application_1590337422418_0043 (state: ACCEPTED)20/05/25 14:04:46 INFO Client: Application report for application_1590337422418_0043 (state: ACCEPTED)20/05/25 14:04:47 INFO Client: Application report for application_1590337422418_0043 (state: ACCEPTED)20/05/25 14:04:48 INFO Client: Application report for application_1590337422418_0043 (state: ACCEPTED)20/05/25 14:04:49 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:04:49 INFO Client:  client token: N/A diagnostics: N/A ApplicationMaster host: XXX.XXX.XXX.XXX ApplicationMaster RPC port: 0 queue: default start time: 1590415436952 final status: UNDEFINED tracking URL: http://ip-XXX-XXX-XXX-XXX.us-west-2.compute.internal:20888/proxy/application_1590337422418_0043/ user: hadoop20/05/25 14:04:50 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:04:51 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:04:52 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:04:53 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:04:54 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:04:55 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:04:56 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:04:57 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:04:58 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:04:59 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:05:00 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:05:01 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:05:02 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:05:03 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:05:04 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:05:05 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:05:06 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:05:07 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:05:08 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:05:09 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:05:10 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:05:11 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:05:12 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:05:13 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:05:14 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:05:15 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:05:16 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:05:17 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:05:18 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:05:19 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:05:20 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:05:21 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:05:22 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:05:23 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:05:24 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:05:25 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:05:26 INFO Client: Application report for application_1590337422418_0043 (state: RUNNING)20/05/25 14:05:27 INFO Client: Application report for application_1590337422418_0043 (state: FINISHED)20/05/25 14:05:27 INFO Client:  client token: N/A diagnostics: User class threw exception: java.lang.RuntimeException: error execute org.apache.kylin.storage.hbase.steps.SparkCubeHFile. Root cause: Job aborted. at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42) at org.apache.kylin.common.util.SparkEntry.main(SparkEntry.java:44) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$4.run(ApplicationMaster.scala:721)Caused by: org.apache.spark.SparkException: Job aborted. at org.apache.spark.internal.io.SparkHadoopWriter$.write(SparkHadoopWriter.scala:100) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1083) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1081) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1081) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:363) at org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:1081) at org.apache.spark.api.java.JavaPairRDD.saveAsNewAPIHadoopDataset(JavaPairRDD.scala:831) at org.apache.kylin.storage.hbase.steps.SparkCubeHFile.execute(SparkCubeHFile.java:238) at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:37) ... 6 moreCaused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 1.0 failed 4 times, most recent failure: Lost task 1.3 in stage 1.0 (TID 15, ip-XXX-XXX-XXX-XXX.us-west-2.compute.internal, executor 3): org.apache.spark.SparkException: Task failed while writing rows at org.apache.spark.internal.io.SparkHadoopWriter$.org$apache$spark$internal$io$SparkHadoopWriter$$executeTask(SparkHadoopWriter.scala:155) at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:83) at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:78) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:109) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hbase.io.hfile.HFile at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.<init>(StoreFile.java:880) at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.<init>(StoreFile.java:805) at org.apache.hadoop.hbase.regionserver.StoreFile$WriterBuilder.build(StoreFile.java:739) at org.apache.kylin.storage.hbase.steps.HFileOutputFormat3$1.getNewWriter(HFileOutputFormat3.java:224) at org.apache.kylin.storage.hbase.steps.HFileOutputFormat3$1.write(HFileOutputFormat3.java:181) at org.apache.kylin.storage.hbase.steps.HFileOutputFormat3$1.write(HFileOutputFormat3.java:153) at org.apache.spark.internal.io.HadoopMapReduceWriteConfigUtil.write(SparkHadoopWriter.scala:356) at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$4.apply(SparkHadoopWriter.scala:130) at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$4.apply(SparkHadoopWriter.scala:127) at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1415) at org.apache.spark.internal.io.SparkHadoopWriter$.org$apache$spark$internal$io$SparkHadoopWriter$$executeTask(SparkHadoopWriter.scala:139) ... 8 more
      Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1803) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1791) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1790) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1790) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:871) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:871) at scala.Option.foreach(Option.scala:257) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:871) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2024) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1973) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1962) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:682) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2034) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2055) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2087) at org.apache.spark.internal.io.SparkHadoopWriter$.write(SparkHadoopWriter.scala:78) ... 16 moreCaused by: org.apache.spark.SparkException: Task failed while writing rows at org.apache.spark.internal.io.SparkHadoopWriter$.org$apache$spark$internal$io$SparkHadoopWriter$$executeTask(SparkHadoopWriter.scala:155) at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:83) at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$3.apply(SparkHadoopWriter.scala:78) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:109) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hbase.io.hfile.HFile at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.<init>(StoreFile.java:880) at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.<init>(StoreFile.java:805) at org.apache.hadoop.hbase.regionserver.StoreFile$WriterBuilder.build(StoreFile.java:739) at org.apache.kylin.storage.hbase.steps.HFileOutputFormat3$1.getNewWriter(HFileOutputFormat3.java:224) at org.apache.kylin.storage.hbase.steps.HFileOutputFormat3$1.write(HFileOutputFormat3.java:181) at org.apache.kylin.storage.hbase.steps.HFileOutputFormat3$1.write(HFileOutputFormat3.java:153) at org.apache.spark.internal.io.HadoopMapReduceWriteConfigUtil.write(SparkHadoopWriter.scala:356) at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$4.apply(SparkHadoopWriter.scala:130) at org.apache.spark.internal.io.SparkHadoopWriter$$anonfun$4.apply(SparkHadoopWriter.scala:127) at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1415) at org.apache.spark.internal.io.SparkHadoopWriter$.org$apache$spark$internal$io$SparkHadoopWriter$$executeTask(SparkHadoopWriter.scala:139) ... 8 more
      {{ ApplicationMaster host: XXX.XXX.XXX.XXX ApplicationMaster RPC port: 0 queue: default start time: 1590415436952 final status: FAILED tracking URL: http://ip-XXX-XXX-XXX-XXX.us-west-2.compute.internal:20888/proxy/application_1590337422418_0043/ user: hadoopException in thread "main" org.apache.spark.SparkException: Application application_1590337422418_0043 finished with failed status at org.apache.spark.deploy.yarn.Client.run(Client.scala:1165) at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1520) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)20/05/25 14:05:27 INFO ShutdownHookManager: Shutdown hook called20/05/25 14:05:27 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-04e9eed4-d16e-406c-9fb0-972cf355db0920/05/25 14:05:27 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-d26c4f1f-1b8a-4cf8-a05b-842294ce017dThe command is: export HADOOP_CONF_DIR=/etc/hadoop/conf && /usr/lib/spark/bin/spark-submit --class org.apache.kylin.common.util.SparkEntry  --conf spark.executor.instances=40  --conf spark.yarn.queue=default  --conf spark.history.fs.logDirectory=hdfs:///kylin/spark-history  --conf spark.master=yarn  --conf spark.hadoop.yarn.timeline-service.enabled=false  --conf spark.executor.memory=5G  --conf spark.eventLog.enabled=true  --conf spark.eventLog.dir=hdfs:///kylin/spark-history  --conf spark.yarn.executor.memoryOverhead=1024  --conf spark.driver.memory=5G  --conf spark.submit.deployMode=cluster  --conf spark.shuffle.service.enabled=true --jars /usr/lib/hbase/lib/hbase-common-1.4.7.jar,/usr/lib/hbase/lib/hbase-server-1.4.7.jar,/usr/lib/hbase/lib/hbase-client-1.4.7.jar,/usr/lib/hbase/lib/hbase-protocol-1.4.7.jar,/usr/lib/hbase/lib/hbase-hadoop-compat-1.4.7.jar,/usr/lib/hbase/lib/htrace-core-3.1.0-incubating.jar,/usr/lib/hbase/lib/metrics-core-2.2.0.jar,/usr/lib/hbase/lib/hbase-hadoop-compat-1.4.7.jar,/usr/lib/hbase/lib/hbase-hadoop2-compat-1.4.7.jar, /usr/local/kylin/apache-kylin-2.6.6-bin-hbase1x/lib/kylin-job-2.6.6.jar -className org.apache.kylin.storage.hbase.steps.SparkCubeHFile -partitions hdfs://ip-XXX-XXX-XXX-XXX.us-west-2.compute.internal:8020/kylin/kylin_metadata/kylin-b75c7f69-2ebf-c5c3-4a6e-b01f177d911f/kylin_sales_cube/rowkey_stats/part-r-00000_hfile -counterOutput hdfs://ip-XXX-XXX-XXX-XXX.us-west-2.compute.internal:8020/kylin/kylin_metadata/kylin-b75c7f69-2ebf-c5c3-4a6e-b01f177d911f/kylin_sales_cube/counter -cubename kylin_sales_cube -output hdfs://ip-XXX-XXX-XXX-XXX.us-west-2.compute.internal:8020/kylin/kylin_metadata/kylin-b75c7f69-2ebf-c5c3-4a6e-b01f177d911f/kylin_sales_cube/hfile -input hdfs://ip-XXX-XXX-XXX-XXX.us-west-2.compute.internal:8020/kylin/kylin_metadata/kylin-b75c7f69-2ebf-c5c3-4a6e-b01f177d911f/kylin_sales_cube/cuboid/ -segmentId 0d22a9ac-5256-02cd-a5b9-44de5247871f -metaUrl kylin_metadata@hdfs,path=hdfs://ip-XXX-XXX-XXX-XXX.us-west-2.compute.internal:8020/kylin/kylin_metadata/kylin-b75c7f69-2ebf-c5c3-4a6e-b01f177d911f/kylin_sales_cube/metadata -hbaseConfPath hdfs://ip-XXX-XXX-XXX-XXX.us-west-2.compute.internal:8020/kylin/kylin_metadata/kylin-b75c7f69-2ebf-c5c3-4a6e-b01f177d911f/hbase-conf.xml at org.apache.kylin.engine.spark.SparkExecutable.doWork(SparkExecutable.java:347) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167) at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:167) at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)}}

       

      Please suggest how this issue can be troubleshooted.

      Thank you and kind regards

      Carlos Molina.

        Attachments

        1. base_2020_05_25_14_29_52.zip
          269 kB
          Carlos Ignacio Molina López

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              cimolinal Carlos Ignacio Molina López
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: