Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
v2.5.0
-
None
Description
In KYLIN-2998, we modified SparkExecutable to check Spark job's status from Yarn in "onExecuteStart", to avoid repeatedly submit job to Hadoop.
While today I found it doesn't work. The spark job got failure:
18/09/11 06:39:03 INFO execution.SparkSqlParser: Parsing command: default.kylin_intermediate_kylin_sales_cube_clone_clone_ea7091c0_a057_b584_4ccf_2368ec5e9f55 18/09/11 06:39:03 INFO metastore.HiveMetaStore: 0: get_table : db=default tbl=kylin_intermediate_kylin_sales_cube_clone_clone_ea7091c0_a057_b584_4ccf_2368ec5e9f55 18/09/11 06:39:03 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_table : db=default tbl=kylin_intermediate_kylin_sales_cube_clone_clone_ea7091c0_a057_b584_4ccf_2368ec5e9f55 Exception in thread "main" java.lang.RuntimeException: error execute org.apache.kylin.engine.spark.SparkFactDistinct. Root cause: Table or view 'kylin_intermediate_kylin_sales_cube_clone_clone_ea7091c0_a057_b584_4ccf_2368ec5e9f55' not found in database 'default'; at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:42) at org.apache.kylin.common.util.SparkEntry.main(SparkEntry.java:44) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:744) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'kylin_intermediate_kylin_sales_cube_clone_clone_ea7091c0_a057_b584_4ccf_2368ec5e9f55' not found in database 'default'; at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74) at org.apache.spark.sql.hive.client.HiveClient$$anonfun$getTable$1.apply(HiveClient.scala:74) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.hive.client.HiveClient$class.getTable(HiveClient.scala:74) at org.apache.spark.sql.hive.client.HiveClientImpl.getTable(HiveClientImpl.scala:78) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable$1.apply(HiveExternalCatalog.scala:118) at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97) at org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark$sql$hive$HiveExternalCatalog$$getRawTable(HiveExternalCatalog.scala:117) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$getTable$1.apply(HiveExternalCatalog.scala:628) at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97) at org.apache.spark.sql.hive.HiveExternalCatalog.getTable(HiveExternalCatalog.scala:627) at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:124) at org.apache.spark.sql.hive.HiveSessionCatalog.lookupRelation(HiveSessionCatalog.scala:70) at org.apache.spark.sql.SparkSession.table(SparkSession.scala:586) at org.apache.spark.sql.SparkSession.table(SparkSession.scala:582) at org.apache.kylin.engine.spark.SparkUtil.hiveRecordInputRDD(SparkUtil.java:157) at org.apache.kylin.engine.spark.SparkFactDistinct.execute(SparkFactDistinct.java:186) at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:37) ... 10 more
But on Yarn, this job shows succeed:
2018-09-11 06:40:06,876 INFO [Scheduler 1854232624 Job a3e39298-8dc3-21f2-cf16-0aa5e451c777-92] spark.SparkExecutable:38 : Application Report : 2018-09-11 06:40:06,876 INFO [Scheduler 1854232624 Job a3e39298-8dc3-21f2-cf16-0aa5e451c777-92] spark.SparkExecutable:38 : Application-Id : application_1533021510235_1085 2018-09-11 06:40:06,876 INFO [Scheduler 1854232624 Job a3e39298-8dc3-21f2-cf16-0aa5e451c777-92] spark.SparkExecutable:38 : Application-Name : Fact distinct columns for:kylin_sales_cube_clone_clone segment ea7091c0-a057-b584-4ccf-2368ec5e9f55 2018-09-11 06:40:06,876 INFO [Scheduler 1854232624 Job a3e39298-8dc3-21f2-cf16-0aa5e451c777-92] spark.SparkExecutable:38 : Application-Type : SPARK 2018-09-11 06:40:06,876 INFO [Scheduler 1854232624 Job a3e39298-8dc3-21f2-cf16-0aa5e451c777-92] spark.SparkExecutable:38 : User : root 2018-09-11 06:40:06,876 INFO [Scheduler 1854232624 Job a3e39298-8dc3-21f2-cf16-0aa5e451c777-92] spark.SparkExecutable:38 : Queue : default 2018-09-11 06:40:06,876 INFO [Scheduler 1854232624 Job a3e39298-8dc3-21f2-cf16-0aa5e451c777-92] spark.SparkExecutable:38 : Start-Time : 1536647893696 2018-09-11 06:40:06,876 INFO [Scheduler 1854232624 Job a3e39298-8dc3-21f2-cf16-0aa5e451c777-92] spark.SparkExecutable:38 : Finish-Time : 1536647943612 2018-09-11 06:40:06,876 INFO [Scheduler 1854232624 Job a3e39298-8dc3-21f2-cf16-0aa5e451c777-92] spark.SparkExecutable:38 : Progress : 100% 2018-09-11 06:40:06,876 INFO [Scheduler 1854232624 Job a3e39298-8dc3-21f2-cf16-0aa5e451c777-92] spark.SparkExecutable:38 : State : FINISHED 2018-09-11 06:40:06,877 INFO [Scheduler 1854232624 Job a3e39298-8dc3-21f2-cf16-0aa5e451c777-92] spark.SparkExecutable:38 : Final-State : SUCCEEDED 2018-09-11 06:40:06,877 INFO [Scheduler 1854232624 Job a3e39298-8dc3-21f2-cf16-0aa5e451c777-92] spark.SparkExecutable:38 : Tracking-URL : N/A 2018-09-11 06:40:06,877 INFO [Scheduler 1854232624 Job a3e39298-8dc3-21f2-cf16-0aa5e451c777-92] spark.SparkExecutable:38 : RPC Port : 0 2018-09-11 06:40:06,877 INFO [Scheduler 1854232624 Job a3e39298-8dc3-21f2-cf16-0aa5e451c777-92] spark.SparkExecutable:38 : AM Host : 10.1.2.137 2018-09-11 06:40:06,877 INFO [Scheduler 1854232624 Job a3e39298-8dc3-21f2-cf16-0aa5e451c777-92] spark.SparkExecutable:38 : Aggregate Resource Allocation : 234749 MB-seconds, 140 vcore-seconds 2018-09-11 06:40:06,877 INFO [Scheduler 1854232624 Job a3e39298-8dc3-21f2-cf16-0aa5e451c777-92] spark.SparkExecutable:38 : Log Aggregation Status : SUCCEEDED 2018-09-11 06:40:06,877 INFO [Scheduler 1854232624 Job a3e39298-8dc3-21f2-cf16-0aa5e451c777-92] spark.SparkExecutable:38 : Diagnostics :
So when job resumed, Kylin though the job was succeed, and move ahead. But in next step, it detected that no file be generated in last step:
2018-09-11 06:40:12,221 ERROR [Scheduler 1854232624 Job a3e39298-8dc3-21f2-cf16-0aa5e451c777-100] steps.SaveStatisticsStep:149 : fail to save cuboid statistics java.io.IOException: fail to find the statistics file in base dir: /kylin/kylin_default_instance/kylin-a3e39298-8dc3-21f2-cf16-0aa5e451c777/kylin_sales_cube_clone_clone/fact_distinct_columns/statistics at org.apache.kylin.engine.mr.steps.SaveStatisticsStep.doWork(SaveStatisticsStep.java:78) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:163) at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:69) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:163) at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:113) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)