Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-17718 Hive on Spark Debugging Improvements
  3. HIVE-20134

Improve logging when HoS Driver is killed due to exceeding memory limits

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Spark
    • None

    Description

      This was improved inĀ HIVE-18093, but more can be done. If a HoS Driver gets killed because it exceeds its memory limits, YARN will issue a SIGTERM on the process. The SIGTERM will cause the shutdown hook in the HoS Driver to be triggered. This causes the Driver to kill all running jobs, even if they are running. The user ends up seeing an error like the one below. Which isn't very informative. We should propagate the error from the Driver shutdown hook to the user.

      INFO : 2018-07-09 17:48:42,580 Stage-64_0: 526/526 Finished Stage-65_0: 1405/1405 Finished Stage-66_0: 0(+759)/1102 Stage-67_0: 0/1099 Stage-68_0: 0/1099 Stage-69_0: 0/1
      INFO : 2018-07-09 17:48:44,589 Stage-64_0: 526/526 Finished Stage-65_0: 1405/1405 Finished Stage-66_0: 1(+759)/1102 Stage-67_0: 0/1099 Stage-68_0: 0/1099 Stage-69_0: 0/1
      INFO : 2018-07-09 17:48:45,591 Stage-64_0: 526/526 Finished Stage-65_0: 1405/1405 Finished Stage-66_0: 2(+759)/1102 Stage-67_0: 0/1099 Stage-68_0: 0/1099 Stage-69_0: 0/1
      INFO : 2018-07-09 17:48:48,596 Stage-64_0: 526/526 Finished Stage-65_0: 1405/1405 Finished Stage-66_0: 2(+759)/1102 Stage-67_0: 0/1099 Stage-68_0: 0/1099 Stage-69_0: 0/1
      ERROR : Spark job[23] failed
      java.lang.InterruptedException: null
      at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998) ~[?:1.8.0_141]
      at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) ~[?:1.8.0_141]
      at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202) ~[scala-library-2.11.8.jar:?]
      at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218) ~[scala-library-2.11.8.jar:?]
      at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:153) ~[scala-library-2.11.8.jar:?]
      at org.apache.spark.SimpleFutureAction.ready(FutureAction.scala:125) ~[spark-core_2.11-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
      at org.apache.spark.SimpleFutureAction.ready(FutureAction.scala:114) ~[spark-core_2.11-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
      at org.apache.spark.util.ThreadUtils$.awaitReady(ThreadUtils.scala:222) ~[spark-core_2.11-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
      at org.apache.spark.JavaFutureActionWrapper.getImpl(FutureAction.scala:264) ~[spark-core_2.11-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
      at org.apache.spark.JavaFutureActionWrapper.get(FutureAction.scala:277) ~[spark-core_2.11-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
      at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:391) ~[hive-exec-2.1.1-SNAPSHOT.jar:2.1.1-SNAPSHOT]
      at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:352) ~[hive-exec-2.1.1-SNAPSHOT.jar:2.1.1-SNAPSHOT]
      at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_141]
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_141]
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_141]
      at java.lang.Thread.run(Thread.java:748) [?:1.8.0_141]
      ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. null
      INFO : Completed executing command(queryId=hive_20180709174140_0f64ee17-f793-441a-9a77-3ee0cd0a9c32); Time taken: 249.727 seconds
      Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. null (state=08S01,code=1)

      Attachments

        Activity

          People

            Unassigned Unassigned
            stakiar Sahil Takiar
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: