Details
-
Bug
-
Status: In Progress
-
Major
-
Resolution: Unresolved
-
3.0.0
-
None
-
None
Description
The BasicDriverFeatureStep for Spark on Kubernetes will upload the files/jars specified by --files/–jars to a hadoop compatible file system configured by spark.kubernetes.file.upload.path. While using HADOOP_CONF_DIR, the spark-submit process can recognize the file system, but when using spark.kubernetes.hadoop.configMapName which only will be mount on the Pods not applied back to our client process.
Heading 1 | Heading 2 |
---|---|
HADOOP_CONF_DIR=/path/to/etc/hadoop | OK |
spark.kubernetes.hadoop.configMapName=hz10-hadoop-dir | FAILED |
Kent@KentsMacBookPro ~/Documents/spark-on-k8s/spark-3.0.0-SNAPSHOT-bin-2.7.3 bin/spark-submit --conf spark.kubernetes.file.upload.path=hdfs://hz-cluster10/user/kyuubi/udf --jars /Users/Kent/Documents/spark-on-k8s/spark-3.0.0-SNAPSHOT-bin-2.7.3/hadoop-lzo-0.4.20-SNAPSHOT.jar --conf spark.kerberos.keytab=/Users/Kent/Downloads/kyuubi.keytab --conf spark.kerberos.principal=kyuubi/dev@HADOOP.HZ.NETEASE.COM --conf spark.kubernetes.kerberos.krb5.path=/etc/krb5.conf --name hehe --deploy-mode cluster --class org.apache.spark.examples.HdfsTest local:///opt/spark/examples/jars/spark-examples_2.12-3.0.0-SNAPSHOT.jar hdfs://hz-cluster10/user/kyuubi/hive_db/kyuubi.db/hive_tbl Listening for transport dt_socket at address: 50014 # spark.master=k8s://https://10.120.238.100:7443 19/08/27 17:21:06 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 19/08/27 17:21:07 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file Listening for transport dt_socket at address: 50014 Exception in thread "main" org.apache.spark.SparkException: Uploading file /Users/Kent/Documents/spark-on-k8s/spark-3.0.0-SNAPSHOT-bin-2.7.3/hadoop-lzo-0.4.20-SNAPSHOT.jar failed... at org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:287) at org.apache.spark.deploy.k8s.KubernetesUtils$.$anonfun$uploadAndTransformFileUris$1(KubernetesUtils.scala:246) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at scala.collection.TraversableLike.map(TraversableLike.scala:237) at scala.collection.TraversableLike.map$(TraversableLike.scala:230) at scala.collection.AbstractTraversable.map(Traversable.scala:108) at org.apache.spark.deploy.k8s.KubernetesUtils$.uploadAndTransformFileUris(KubernetesUtils.scala:245) at org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.$anonfun$getAdditionalPodSystemProperties$1(BasicDriverFeatur# spark.master=k8s://https://10.120.238.100:7443 eStep.scala:165) at scala.collection.immutable.List.foreach(List.scala:392) at org.apache.spark.deploy.k8s.features.BasicDriverFeatureStep.getAdditionalPodSystemProperties(BasicDriverFeatureStep.scala:163) at org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$3(KubernetesDriverBuilder.scala:60) at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126) at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122) at scala.collection.immutable.List.foldLeft(List.scala:89) at org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:58) at org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:101) at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$10(KubernetesClientApplication.scala:236) at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$10$adapted(KubernetesClientApplication.scala:229) at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2567) at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:229) at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:198) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:920) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:179) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:202) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:89) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:999) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1008) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.IllegalArgumentException: java.net.UnknownHostException: hz-cluster10 at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:378) at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:310) at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176) at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:678) at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373) at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1881) at org.apache.spark.deploy.k8s.KubernetesUtils$.uploadFileUri(KubernetesUtils.scala:278) ... 30 more Caused by: java.net.UnknownHostException: hz-cluster10 ... 43 more
Other related spark configurations
spark.master=k8s://https://10.120.238.100:7443 # spark.master=k8s://https://10.120.238.253:7443 spark.kubernetes.container.image=harbor-inner.sparkonk8s.netease.com/tenant1-project1/spark:v3.0.0-20190813 # spark.kubernetes.driver.container.image=harbor-inner.sparkonk8s.netease.com/tenant1-project1/spark:v3.0.0-20190813 # spark.kubernetes.executor.container.image=harbor-inner.sparkonk8s.netease.com/tenant1-project1/spark:v3.0.0-20190813 spark.executor.instanses=5 spark.kubernetes.namespace=ns1 spark.kubernetes.container.image.pullSecrets=mysecret spark.kubernetes.hadoop.configMapName=hz10-hadoop-dir spark.kubernetes.kerberos.krb5.path=/etc/krb5.conf spark.kerberos.principal=kyuubi/dev@HADOOP.HZ.NETEASE.COM spark.kerberos.keytab=/Users/Kent/Downloads/kyuubi.keytab
Attachments
Issue Links
- is duplicated by
-
SPARK-28896 Spark client process is unable to upload jars to hdfs while using ConfigMap not HADOOP_CONF_DIR
- Resolved
- links to