Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-30529

Improve error messages when Executor dies before registering with driver

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 3.0.0
    • Spark Core
    • None

    Description

      currently when you give a bad configuration for accelerator aware scheduling to the executor, the Executors can die but its hard for the user to know why.  The executor dies and logs in its log files what went wrong but many times it hard to find those logs because the executor hasn't registered yet.  Since it hasn't registered the executor doesn't show up on UI to see log files.

      One specific example is you give a discovery script that that doesn't find all the GPUs:

      20/01/16 08:59:24 INFO YarnCoarseGrainedExecutorBackend: Connecting to driver: spark://CoarseGrainedScheduler@10.28.9.112:44403
      20/01/16 08:59:24 ERROR Inbox: Ignoring error
      java.lang.IllegalArgumentException: requirement failed: Resource: gpu, with addresses: 0 is less than what the user requested: 2)
       at scala.Predef$.require(Predef.scala:281)
       at org.apache.spark.resource.ResourceUtils$.$anonfun$assertAllResourceAllocationsMatchResourceProfile$1(ResourceUtils.scala:251)
       at org.apache.spark.resource.ResourceUtils$.$anonfun$assertAllResourceAllocationsMatchResourceProfile$1$adapted(ResourceUtils.scala:248)
      

       

      Figure out a better way of logging or letting user know  what error occurred when the executor dies before registering

      Attachments

        Issue Links

          Activity

            People

              tgraves Thomas Graves
              tgraves Thomas Graves
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: