Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-34684

Hadoop config could not be successfully serilized from driver pods to executor pods

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Not A Problem
    • 3.0.1, 3.0.2
    • None
    • Kubernetes
    • None

    Description

      I have set HADOOP_CONF_DIR correctly. And I have verified that hadoop configs have been stored into a configmap and mounted to driver. However, spark pi example job keeps failing where executor do not know how to talk to hdfs. I highly suspect that there is a bug causing it, as I manually create a configmap storing hadoop configs and mounted it to executor in template file, which could fix the error. 

       

      Spark submit command:

      /opt/spark-3.0/bin/spark-submit --class org.apache.spark.examples.SparkPi --deploy-mode cluster --master k8s://https://10.***.18.96:6443 --num-executors 1 --conf spark.kubernetes.namespace=test --conf spark.kubernetes.container.image=**** --conf spark.kubernetes.driver.podTemplateFile=/opt/spark-3.0/conf/spark-driver.template --conf spark.kubernetes.executor.podTemplateFile=/opt/spark-3.0/conf/spark-executor.template  --conf spark.kubernetes.file.upload.path=/opt/spark-3.0/examples/jars hdfs:///tmp/spark-examples_2.12-3.0.125067.jar 1000

       

       

      Error log:

       

      21/03/10 06:59:58 INFO TransportClientFactory: Successfully created connection to org-apache-spark-examples-sparkpi-0e58b6781aeef2d5-driver-svc.test.svc/100.64.0.191:7078 after 608 ms (392 ms spent in bootstraps)
      21/03/10 06:59:58 INFO SecurityManager: Changing view acls to: root
      21/03/10 06:59:58 INFO SecurityManager: Changing modify acls to: root
      21/03/10 06:59:58 INFO SecurityManager: Changing view acls groups to:
      21/03/10 06:59:58 INFO SecurityManager: Changing modify acls groups to:
      21/03/10 06:59:58 INFO SecurityManager: SecurityManager: authentication enabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
      21/03/10 06:59:59 INFO TransportClientFactory: Successfully created connection to org-apache-spark-examples-sparkpi-0e58b6781aeef2d5-driver-svc.test.svc/100.64.0.191:7078 after 130 ms (104 ms spent in bootstraps)
      21/03/10 06:59:59 INFO DiskBlockManager: Created local directory at /var/data/spark-0f541e3d-994f-4c7a-843f-f7dac57dfc13/blockmgr-981cfb62-5b27-4d1a-8fbd-eddb466faf1d
      21/03/10 06:59:59 INFO MemoryStore: MemoryStore started with capacity 2047.2 MiB
      21/03/10 06:59:59 INFO CoarseGrainedExecutorBackend: Connecting to driver: spark://CoarseGrainedScheduler@org-apache-spark-examples-sparkpi-0e58b6781aeef2d5-driver-svc.test.svc:7078
      21/03/10 06:59:59 INFO ResourceUtils: ==============================================================
      21/03/10 06:59:59 INFO ResourceUtils: Resources for spark.executor:

      21/03/10 06:59:59 INFO ResourceUtils: ==============================================================
      21/03/10 06:59:59 INFO CoarseGrainedExecutorBackend: Successfully registered with driver
      21/03/10 06:59:59 INFO Executor: Starting executor ID 1 on host 100.64.0.192
      21/03/10 07:00:00 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 37956.
      21/03/10 07:00:00 INFO NettyBlockTransferService: Server created on 100.64.0.192:37956
      21/03/10 07:00:00 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
      21/03/10 07:00:00 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(1, 100.64.0.192, 37956, None)
      21/03/10 07:00:00 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(1, 100.64.0.192, 37956, None)
      21/03/10 07:00:00 INFO BlockManager: Initialized BlockManager: BlockManagerId(1, 100.64.0.192, 37956, None)
      21/03/10 07:00:01 INFO CoarseGrainedExecutorBackend: Got assigned task 0
      21/03/10 07:00:01 INFO CoarseGrainedExecutorBackend: Got assigned task 1
      21/03/10 07:00:01 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
      21/03/10 07:00:01 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
      21/03/10 07:00:01 INFO Executor: Fetching spark://org-apache-spark-examples-sparkpi-0e58b6781aeef2d5-driver-svc.test.svc:7078/jars/spark-examples_2.12-3.0.125067.jar with timestamp 1615359587432
      21/03/10 07:00:01 INFO TransportClientFactory: Successfully created connection to org-apache-spark-examples-sparkpi-0e58b6781aeef2d5-driver-svc.test.svc/100.64.0.191:7078 after 65 ms (58 ms spent in bootstraps)
      21/03/10 07:00:01 INFO Utils: Fetching spark://org-apache-spark-examples-sparkpi-0e58b6781aeef2d5-driver-svc.test.svc:7078/jars/spark-examples_2.12-3.0.125067.jar to /var/data/spark-0f541e3d-994f-4c7a-843f-f7dac57dfc13/spark-1b32a101-9bf6-4836-a243-bd853253e85f/fetchFileTemp12837078937383244276.tmp
      21/03/10 07:00:01 INFO Utils: Copying /var/data/spark-0f541e3d-994f-4c7a-843f-f7dac57dfc13/spark-1b32a101-9bf6-4836-a243-bd853253e85f/-3355581251615359587432_cache to /opt/spark/work-dir/./spark-examples_2.12-3.0.125067.jar
      21/03/10 07:00:01 INFO Executor: Adding file:/opt/spark/work-dir/./spark-examples_2.12-3.0.125067.jar to class loader
      21/03/10 07:00:01 INFO Executor: Fetching hdfs:///tmp/spark-examples_2.12-3.0.125067.jar with timestamp 1615359587441
      21/03/10 07:00:01 INFO Executor: Fetching hdfs:///tmp/spark-examples_2.12-3.0.125067.jar with timestamp 1615359587441
      21/03/10 07:00:01 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1)
      java.io.IOException: Incomplete HDFS URI, no host: hdfs:///tmp/spark-examples_2.12-3.0.125067.jar
      at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:170)
      at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
      at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
      at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
      at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
      at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
      at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1853)
      at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:737)
      at org.apache.spark.util.Utils$.fetchFile(Utils.scala:522)
      at org.apache.spark.executor.Executor.$anonfun$updateDependencies$7(Executor.scala:871)
      at org.apache.spark.executor.Executor.$anonfun$updateDependencies$7$adapted(Executor.scala:862)
      at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877)
      at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149)
      at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
      at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
      at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
      at scala.collection.mutable.HashMap.foreach(HashMap.scala:149)
      at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:876)
      at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:862)
      at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:406)
      at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
      at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
      at java.base/java.lang.Thread.run(Unknown Source)

       21/03/10 07:00:01 INFO CoarseGrainedExecutorBackend: Got assigned task 2
      21/03/10 07:00:01 INFO Executor: Running task 2.0 in stage 0.0 (TID 2)
      21/03/10 07:00:01 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
      java.io.IOException: Incomplete HDFS URI, no host: hdfs:///tmp/spark-examples_2.12-3.0.125067.jar
      at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:170)
      at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
      at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
      at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
      at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
      at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
      at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1853)
      at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:737)
      at org.apache.spark.util.Utils$.fetchFile(Utils.scala:522)
      at org.apache.spark.executor.Executor.$anonfun$updateDependencies$7(Executor.scala:871)
      at org.apache.spark.executor.Executor.$anonfun$updateDependencies$7$adapted(Executor.scala:862)
      at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877)
      at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149)
      at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
      at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
      at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
      at scala.collection.mutable.HashMap.foreach(HashMap.scala:149)
      at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:876)
      at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:862)
      at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:406)
      at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
      at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
      at java.base/java.lang.Thread.run(Unknown Source)
      21/03/10 07:00:01 INFO Executor: Fetching hdfs:///tmp/spark-examples_2.12-3.0.125067.jar with timestamp 1615359587441
      21/03/10 07:00:01 INFO CoarseGrainedExecutorBackend: Got assigned task 3
      21/03/10 07:00:01 INFO Executor: Running task 1.1 in stage 0.0 (TID 3)
      21/03/10 07:00:01 ERROR Executor: Exception in task 2.0 in stage 0.0 (TID 2)
      java.io.IOException: Incomplete HDFS URI, no host: hdfs:///tmp/spark-examples_2.12-3.0.125067.jar
      at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:170)
      at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
      at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
      at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
      at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
      at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
      at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1853)
      at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:737)
      at org.apache.spark.util.Utils$.fetchFile(Utils.scala:522)
      at org.apache.spark.executor.Executor.$anonfun$updateDependencies$7(Executor.scala:871)
      at org.apache.spark.executor.Executor.$anonfun$updateDependencies$7$adapted(Executor.scala:862)
      at scala.collection.TraversableLike$WithFilter.$anonfun$foreach$1(TraversableLike.scala:877)
      at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149)
      at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
      at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
      at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
      at scala.collection.mutable.HashMap.foreach(HashMap.scala:149)
      at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:876)
      at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$updateDependencies(Executor.scala:862)
      at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:406)
      at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
      at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
      at java.base/java.lang.Thread.run(Unknown Source)
      21/03/10 07:00:01 INFO Executor: Fetching hdfs:///tmp/spark-examples_2.12-3.0.125067.jar with timestamp 1615359587441
      21/03/10 07:00:01 INFO CoarseGrainedExecutorBackend: Got assigned task 4
      21/03/10 07:00:01 INFO Executor: Running task 0.1 in stage 0.0 (TID 4)
      21/03/10 07:00:01 ERROR Executor: Exception in task 1.1 in stage 0.0 (TID 3)
      java.io.IOException: Incomplete HDFS URI, no host: hdfs:///tmp/spark-examples_2.12-3.0.125067.jar

      Attachments

        Activity

          People

            Unassigned Unassigned
            ypeng65 Yue Peng
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: