Details
-
Bug
-
Status: Resolved
-
Trivial
-
Resolution: Not A Bug
-
3.0.1
-
None
-
None
-
Amazon EMR: emr-6.2.0
Spark Version: Spark 3.0.1Instance Type: g3.4xlarge
AMI Name: emr-6_2_0-image-builder-ami-hvm-x86_64 2020-11-01T00-56-10.917ZSpark Configs:
sc_conf = SparkConf() \ .set('spark.driver.resource.gpu.discoveryScript', '/opt/spark/getGpusResources.sh') \ .set('spark.driver.resource.gpu.amount', '1') \ .set('spark.rapids.sql.enabled', 'ALL')
Amazon EMR: emr-6.2.0 Spark Version: Spark 3.0.1 Instance Type: g3.4xlarge AMI Name: emr-6_2_0-image-builder-ami-hvm-x86_64 2020-11-01T00-56-10.917Z Spark Configs: sc_conf = SparkConf() \ .set( 'spark.driver.resource.gpu.discoveryScript' , '/opt/spark/getGpusResources.sh' ) \ .set( 'spark.driver.resource.gpu.amount' , '1' ) \ .set( 'spark.rapids.sql.enabled' , 'ALL' )
Description
Error to execute Spark on GPU. The stack trace is below:
20/12/14 18:39:41 WARN ResourceRequestHelper: YARN doesn't know about resource yarn.io/gpu, your resource discovery has to handle properly discovering and isolating the resource! Error: The resource manager encountered a problem that should not occur under normal circumstances. Please report this error to the Hadoop community by opening a JIRA ticket at http://issues.apache.org/jira and including the following information:20/12/14 18:39:41 WARN ResourceRequestHelper: YARN doesn't know about resource yarn.io/gpu, your resource discovery has to handle properly discovering and isolating the resource! Error: The resource manager encountered a problem that should not occur under normal circumstances. Please report this error to the Hadoop community by opening a JIRA ticket at http://issues.apache.org/jira and including the following information:* Resource type requested: yarn.io/gpu* Resource object: <memory:896, vCores:1>* The stack trace for this exception: java.lang.Exception at org.apache.hadoop.yarn.exceptions.ResourceNotFoundException.<init>(ResourceNotFoundException.java:47) at org.apache.hadoop.yarn.api.records.Resource.getResourceInformation(Resource.java:268) at org.apache.hadoop.yarn.api.records.impl.pb.ResourcePBImpl.setResourceInformation(ResourcePBImpl.java:198) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.yarn.ResourceRequestHelper$.$anonfun$setResourceRequests$4(ResourceRequestHelper.scala:183) at scala.collection.immutable.Map$Map1.foreach(Map.scala:128) at org.apache.spark.deploy.yarn.ResourceRequestHelper$.setResourceRequests(ResourceRequestHelper.scala:170) at org.apache.spark.deploy.yarn.Client.createApplicationSubmissionContext(Client.scala:277) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:196) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:60) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:201) at org.apache.spark.SparkContext.<init>(SparkContext.scala:555) at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:238) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748) After encountering this error, the resource manager is in an inconsistent state. It is safe for the resource manager to be restarted as the error encountered should be transitive. If high availability is enabled, failing over to a standby resource manager is also safe.20/12/14 18:39:46 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!