Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-19835

Runing CTAS sql on spark2.0.1 hung

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Not A Problem
    • 2.0.1
    • None
    • SQL
    • None
    • spark2.0.1, hadoop2.6.0, jdk1.7

    Description

      when I run CAST sql such as:
      ---------------------
      create table A as
      select a.id, b.id, c.id from
      (select * from B) a
      left join
      (select * from C) b
      left join
      (select id from D group by id) c
      on a.id =b.id and a.id =c.id;
      -----------------------------------------------------------------
      then it hung and not continue running, no errors return. the last few INFO msg is as follows:
      ---------------
      17/03/06 18:57:47 INFO spark.SparkContext: Starting job: processCmd at CliDriver.java:376
      17/03/06 18:57:47 INFO scheduler.DAGScheduler: Got job 2 (processCmd at CliDriver.java:376) with 2 output partitions
      17/03/06 18:57:47 INFO scheduler.DAGScheduler: Final stage: ResultStage 3 (processCmd at CliDriver.java:376)
      17/03/06 18:57:47 INFO scheduler.DAGScheduler: Parents of final stage: List()
      17/03/06 18:57:47 INFO scheduler.DAGScheduler: Missing parents: List()
      17/03/06 18:57:47 INFO scheduler.DAGScheduler: Submitting ResultStage 3 (MapPartitionsRDD[20] at processCmd at CliDriver.java:376), which has no missing parents
      17/03/06 18:57:47 INFO memory.MemoryStore: Block broadcast_8 stored as values in memory (estimated size 150.1 KB, free 3.7 GB)
      17/03/06 18:57:47 INFO memory.MemoryStore: Block broadcast_8_piece0 stored as bytes in memory (estimated size 55.0 KB, free 3.7 GB)
      17/03/06 18:57:47 INFO storage.BlockManagerInfo: Added broadcast_8_piece0 in memory on 10.87.101.151:64500 (size: 55.0 KB, free: 3.8 GB)
      17/03/06 18:57:47 INFO spark.SparkContext: Created broadcast 8 from broadcast at DAGScheduler.scala:1012
      17/03/06 18:57:47 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 3 (MapPartitionsRDD[20] at processCmd at CliDriver.java:376)
      17/03/06 18:57:47 INFO cluster.YarnScheduler: Adding task set 3.0 with 2 tasks
      17/03/06 18:57:47 INFO scheduler.FairSchedulableBuilder: Added task set TaskSet_3 tasks to pool default
      17/03/06 18:57:47 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 3.0 (TID 204, hadoop491.dx.momo.com, partition 0, RACK_LOCAL, 5824 bytes)
      17/03/06 18:57:47 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Launching task 204 on executor id: 50 hostname: hadoop491.dx.momo.com.
      17/03/06 18:57:48 INFO storage.BlockManagerInfo: Added broadcast_8_piece0 in memory on hadoop491.dx.momo.com:33300 (size: 55.0 KB, free: 2.8 GB)
      17/03/06 18:57:48 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on hadoop491.dx.momo.com:33300 (size: 42.7 KB, free: 2.8 GB)
      17/03/06 18:57:50 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in memory on hadoop491.dx.momo.com:33300 (size: 1678.6 KB, free: 2.8 GB)
      17/03/06 18:57:50 INFO storage.BlockManagerInfo: Added broadcast_7_piece0 in memory on hadoop491.dx.momo.com:33300 (size: 3.8 MB, free: 2.8 GB)
      17/03/06 18:57:51 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 3.0 (TID 205, hadoop605.dx.momo.com, partition 1, ANY, 5824 bytes)
      17/03/06 18:57:51 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Launching task 205 on executor id: 9 hostname: hadoop605.dx.momo.com.
      17/03/06 18:57:51 INFO storage.BlockManagerInfo: Added broadcast_8_piece0 in memory on hadoop605.dx.momo.com:37394 (size: 55.0 KB, free: 2.8 GB)
      17/03/06 18:57:52 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on hadoop605.dx.momo.com:37394 (size: 42.7 KB, free: 2.8 GB)
      17/03/06 18:57:54 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in memory on hadoop605.dx.momo.com:37394 (size: 1678.6 KB, free: 2.8 GB)
      17/03/06 18:57:54 INFO storage.BlockManagerInfo: Added broadcast_7_piece0 in memory on hadoop605.dx.momo.com:37394 (size: 3.8 MB, free: 2.8 GB)
      17/03/06 19:05:31 INFO storage.BlockManagerInfo: Removed broadcast_6_piece0 on 10.87.101.151:64500 in memory (size: 10.0 KB, free: 3.8 GB)
      17/03/06 19:05:31 INFO storage.BlockManagerInfo: Removed broadcast_6_piece0 on hadoop572.dx.momo.com:22730 in memory (size: 10.0 KB, free: 2.8 GB)
      ------------------------------------------------------------------------------------------------------
      when I enter CTRL + C to cancel the job,the output stack is as follows:
      ------
      org.apache.spark.SparkException: Job 2 cancelled as part of cancellation of all jobs
      at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1454)
      at org.apache.spark.scheduler.DAGScheduler.handleJobCancellation(DAGScheduler.scala:1393)
      at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAllJobs$1.apply$mcVI$sp(DAGScheduler.scala:725)
      at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAllJobs$1.apply(DAGScheduler.scala:725)
      at org.apache.spark.scheduler.DAGScheduler$$anonfun$doCancelAllJobs$1.apply(DAGScheduler.scala:725)
      at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
      at org.apache.spark.scheduler.DAGScheduler.doCancelAllJobs(DAGScheduler.scala:725)
      at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1645)
      at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1622)
      at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1611)
      at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
      at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:632)
      at org.apache.spark.SparkContext.runJob(SparkContext.scala:1890)
      at org.apache.spark.SparkContext.runJob(SparkContext.scala:1903)
      at org.apache.spark.SparkContext.runJob(SparkContext.scala:1923)
      at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.saveAsHiveFile(InsertIntoHiveTable.scala:132)
      at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:234)
      at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:143)
      at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:310)
      at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
      at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
      at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
      at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
      at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
      at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
      at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
      at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
      at org.apache.spark.sql.hive.execution.CreateHiveTableAsSelectCommand.run(CreateHiveTableAsSelectCommand.scala:93)
      at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:60)
      at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:58)
      at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
      at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
      at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
      at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
      at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
      at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
      at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
      at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
      at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
      at org.apache.spark.sql.Dataset.<init>(Dataset.scala:186)
      at org.apache.spark.sql.Dataset.<init>(Dataset.scala:167)
      at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:65)
      at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:687)
      at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:682)
      at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:62)
      at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:331)
      at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
      at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:247)
      at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:606)
      at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:740)
      at org.apache.spark.deploy.SparkSubmit$$anon$1.run(SparkSubmit.scala:167)
      at org.apache.spark.deploy.SparkSubmit$$anon$1.run(SparkSubmit.scala:165)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:415)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
      at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:165)
      at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
      at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
      at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
      ----------------------------------------------------------------------------------
      Part of the task thread dump info is:
      -------
      91 Executor task launch worker-0 RUNNABLE
      96 DataStreamer for file /tmp/spark/stage-datapm_hive_2017-03-06_18-57-40_470_3468784292858131323-1/-ext-10000/_temporary/0/_temporary/attempt_201703061857_0003_m_000000_0/part-00000 block BP-454299492-10.84.100.171-1416301904728:blk_3772186281_2703877735 TIMED_WAITING
      48 dispatcher-event-loop-0 WAITING
      49 dispatcher-event-loop-1 WAITING
      ------------------------------------------------------------------------------------------
      the detail of thread ID 91 is:
      --------------------------------
      java.io.DataOutputStream.write(DataOutputStream.java:107)
      org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat$1.write(HiveIgnoreKeyTextOutputFormat.java:87)
      org.apache.spark.sql.hive.SparkHiveWriterContainer$$anonfun$writeToFile$1.apply(hiveWriterContainers.scala:191)
      org.apache.spark.sql.hive.SparkHiveWriterContainer$$anonfun$writeToFile$1.apply(hiveWriterContainers.scala:185)
      scala.collection.Iterator$class.foreach(Iterator.scala:893)
      org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.foreach(WholeStageCodegenExec.scala:368)
      org.apache.spark.sql.hive.SparkHiveWriterContainer.writeToFile(hiveWriterContainers.scala:185)
      org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:132)
      org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:132)
      org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
      org.apache.spark.scheduler.Task.run(Task.scala:86)
      org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
      java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      java.lang.Thread.run(Thread.java:745)
      ------------------------------------------------------------------------------------------------------------
      the detail of thread ID 461 is:
      ResponseProcessor for block BP-454299492-10.84.100.171-1416301904728:blk_3772682244_2704373780 RUNNABLE
      sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
      sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
      sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
      sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
      sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
      org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335)
      org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
      org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
      org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
      org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
      java.io.FilterInputStream.read(FilterInputStream.java:83)
      java.io.FilterInputStream.read(FilterInputStream.java:83)
      org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2201)
      org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:176)
      org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:867)
      ----------------------------------------------------------------------------------------------
      spark ui INFO: I set 400 cores for the job, but only 2 task alive , the failed task reason is as follows:
      org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /tmp/spark/stage-hadoop_hive_2017-03-06_18-21-10_555_7794446277054555688-2/-ext-10000/_temporary/0/_temporary/attempt_201703061821_0131_m_000001_0/part-00001 (inode 2257822862): File does not exist. [Lease. Holder: DFSClient_NONMAPREDUCE_1227261476_92, pendingcreates: 1]

      ---------------
      I wonder if someone else can reproduce the issue, if can, then this may be a bug.

      Attachments

        Activity

          People

            Unassigned Unassigned
            xwc3504 bruce xu
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: