[SPARK-34684] Hadoop config could not be successfully serilized from driver pods to executor pods - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Not A Problem
Affects Version/s: 3.0.1, 3.0.2
Fix Version/s: None
Component/s: Kubernetes
Labels:
None

Description

I have set HADOOP_CONF_DIR correctly. And I have verified that hadoop configs have been stored into a configmap and mounted to driver. However, spark pi example job keeps failing where executor do not know how to talk to hdfs. I highly suspect that there is a bug causing it, as I manually create a configmap storing hadoop configs and mounted it to executor in template file, which could fix the error.

Spark submit command:

/opt/spark-3.0/bin/spark-submit --class org.apache.spark.examples.SparkPi --deploy-mode cluster --master k8s://https://10.***.18.96:6443 --num-executors 1 --conf spark.kubernetes.namespace=test --conf spark.kubernetes.container.image=**** --conf spark.kubernetes.driver.podTemplateFile=/opt/spark-3.0/conf/spark-driver.template --conf spark.kubernetes.executor.podTemplateFile=/opt/spark-3.0/conf/spark-executor.template --conf spark.kubernetes.file.upload.path=/opt/spark-3.0/examples/jars hdfs:///tmp/spark-examples_2.12-3.0.125067.jar 1000

Error log:

21/03/10 06:59:58 INFO TransportClientFactory: Successfully created connection to org-apache-spark-examples-sparkpi-0e58b6781aeef2d5-driver-svc.test.svc/100.64.0.191:7078 after 608 ms (392 ms spent in bootstraps)
21/03/10 06:59:58 INFO SecurityManager: Changing view acls to: root
21/03/10 06:59:58 INFO SecurityManager: Changing modify acls to: root
21/03/10 06:59:58 INFO SecurityManager: Changing view acls groups to:
21/03/10 06:59:58 INFO SecurityManager: Changing modify acls groups to:
21/03/10 06:59:58 INFO SecurityManager: SecurityManager: authentication enabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
21/03/10 06:59:59 INFO TransportClientFactory: Successfully created connection to org-apache-spark-examples-sparkpi-0e58b6781aeef2d5-driver-svc.test.svc/100.64.0.191:7078 after 130 ms (104 ms spent in bootstraps)
21/03/10 06:59:59 INFO DiskBlockManager: Created local directory at /var/data/spark-0f541e3d-994f-4c7a-843f-f7dac57dfc13/blockmgr-981cfb62-5b27-4d1a-8fbd-eddb466faf1d
21/03/10 06:59:59 INFO MemoryStore: MemoryStore started with capacity 2047.2 MiB
21/03/10 06:59:59 INFO CoarseGrainedExecutorBackend: Connecting to driver: spark://CoarseGrainedScheduler@org-apache-spark-examples-sparkpi-0e58b6781aeef2d5-driver-svc.test.svc:7078
21/03/10 06:59:59 INFO ResourceUtils: ==============================================================
21/03/10 06:59:59 INFO ResourceUtils: Resources for spark.executor:

21/03/10 06:59:59 INFO ResourceUtils: ==============================================================
21/03/10 06:59:59 INFO CoarseGrainedExecutorBackend: Successfully registered with driver
21/03/10 06:59:59 INFO Executor: Starting executor ID 1 on host 100.64.0.192
21/03/10 07:00:00 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 37956.
21/03/10 07:00:00 INFO NettyBlockTransferService: Server created on 100.64.0.192:37956
21/03/10 07:00:00 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
21/03/10 07:00:00 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(1, 100.64.0.192, 37956, None)
21/03/10 07:00:00 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(1, 100.64.0.192, 37956, None)
21/03/10 07:00:00 INFO BlockManager: Initialized BlockManager: BlockManagerId(1, 100.64.0.192, 37956, None)
21/03/10 07:00:01 INFO CoarseGrainedExecutorBackend: Got assigned task 0
21/03/10 07:00:01 INFO CoarseGrainedExecutorBackend: Got assigned task 1
21/03/10 07:00:01 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
21/03/10 07:00:01 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
21/03/10 07:00:01 INFO Executor: Fetching spark://org-apache-spark-examples-sparkpi-0e58b6781aeef2d5-driver-svc.test.svc:7078/jars/spark-examples_2.12-3.0.125067.jar with timestamp 1615359587432
21/03/10 07:00:01 INFO TransportClientFactory: Successfully created connection to org-apache-spark-examples-sparkpi-0e58b6781aeef2d5-driver-svc.test.svc/100.64.0.191:7078 after 65 ms (58 ms spent in bootstraps)
21/03/10 07:00:01 INFO Utils: Fetching spark://org-apache-spark-examples-sparkpi-0e58b6781aeef2d5-driver-svc.test.svc:7078/jars/spark-examples_2.12-3.0.125067.jar to /var/data/spark-0f541e3d-994f-4c7a-843f-f7dac57dfc13/spark-1b32a101-9bf6-4836-a243-bd853253e85f/fetchFileTemp12837078937383244276.tmp
21/03/10 07:00:01 INFO Utils: Copying /var/data/spark-0f541e3d-994f-4c7a-843f-f7dac57dfc13/spark-1b32a101-9bf6-4836-a243-bd853253e85f/-3355581251615359587432_cache to /opt/spark/work-dir/./spark-examples_2.12-3.0.125067.jar
21/03/10 07:00:01 INFO Executor: Adding file:/opt/spark/work-dir/./spark-examples_2.12-3.0.125067.jar to class loader
21/03/10 07:00:01 INFO Executor: Fetching hdfs:///tmp/spark-examples_2.12-3.0.125067.jar with timestamp 1615359587441
21/03/10 07:00:01 INFO Executor: Fetching hdfs:///tmp/spark-examples_2.12-3.0.125067.jar with timestamp 1615359587441
21/03/10 07:00:01 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1)
java.io.IOException: Incomplete HDFS URI, no host: hdfs:///tmp/spark-examples_2.12-3.0.125067.jar
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:170)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1853)
at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:737)
at org.apache.spark.util.Utils$.fetchFile(Utils.scala:522)
at org.apache.spark.executor.Executor.$anonfun$updateDependencies$7(Executor.scala:871)
at org.apache.spark.executor.Executor.$anonfun$updateDependencies$7$adapted(Executor.scala:862)
at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877)
at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149)
at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:149)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:876)
at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:862)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:406)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)

21/03/10 07:00:01 INFO CoarseGrainedExecutorBackend: Got assigned task 2
21/03/10 07:00:01 INFO Executor: Running task 2.0 in stage 0.0 (TID 2)
21/03/10 07:00:01 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.io.IOException: Incomplete HDFS URI, no host: hdfs:///tmp/spark-examples_2.12-3.0.125067.jar
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:170)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1853)
at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:737)
at org.apache.spark.util.Utils$.fetchFile(Utils.scala:522)
at org.apache.spark.executor.Executor.$anonfun$updateDependencies$7(Executor.scala:871)
at org.apache.spark.executor.Executor.$anonfun$updateDependencies$7$adapted(Executor.scala:862)
at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877)
at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149)
at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:149)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:876)
at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:862)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:406)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
21/03/10 07:00:01 INFO Executor: Fetching hdfs:///tmp/spark-examples_2.12-3.0.125067.jar with timestamp 1615359587441
21/03/10 07:00:01 INFO CoarseGrainedExecutorBackend: Got assigned task 3
21/03/10 07:00:01 INFO Executor: Running task 1.1 in stage 0.0 (TID 3)
21/03/10 07:00:01 ERROR Executor: Exception in task 2.0 in stage 0.0 (TID 2)
java.io.IOException: Incomplete HDFS URI, no host: hdfs:///tmp/spark-examples_2.12-3.0.125067.jar
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:170)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1853)
at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:737)
at org.apache.spark.util.Utils$.fetchFile(Utils.scala:522)
at org.apache.spark.executor.Executor.$anonfun$updateDependencies$7(Executor.scala:871)
at org.apache.spark.executor.Executor.$anonfun$updateDependencies$7$adapted(Executor.scala:862)
at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877)
at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149)
at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:149)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:876)
at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:862)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:406)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
21/03/10 07:00:01 INFO Executor: Fetching hdfs:///tmp/spark-examples_2.12-3.0.125067.jar with timestamp 1615359587441
21/03/10 07:00:01 INFO CoarseGrainedExecutorBackend: Got assigned task 4
21/03/10 07:00:01 INFO Executor: Running task 0.1 in stage 0.0 (TID 4)
21/03/10 07:00:01 ERROR Executor: Exception in task 1.1 in stage 0.0 (TID 3)
java.io.IOException: Incomplete HDFS URI, no host: hdfs:///tmp/spark-examples_2.12-3.0.125067.jar

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Yue Peng

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 10/Mar/21 07:17

Updated:: 31/Mar/21 15:33

Resolved:: 31/Mar/21 15:33