[SPARK-24547] Spark on K8s docker-image-tool.sh improvements - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 2.4.0
Fix Version/s: 2.4.0
Component/s: Kubernetes, Spark Core
Labels:
- docker
- kubernetes
- spark

Description

Context

PySpark support for Spark on k8s was merged with https://github.com/apache/spark/pull/21092/files few days ago

There is a helper script that can be used to create docker containers to run java and now also python jobs. It works like this:

/path/to/docker-image-tool.sh -r node001:5000/brightcomputing -t v2.4.0 build
/path/to/docker-image-tool.sh -r node001:5000/brightcomputing -t v2.4.0 push

Problem

I ran into three two issues. First time I generated images for 2.4.0 Docker was using it's cache, so actually when running jobs, old jars where still in the Docker image. This produces errors like this in the executors:

2018-06-13 10:27:52 INFO NettyBlockTransferService:54 - Server created on 172.29.3.4:44877^M 2018-06-13 10:27:52 INFO BlockManager:54 - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy^M 2018-06-13 10:27:52 INFO BlockManagerMaster:54 - Registering BlockManager BlockManagerId(1, 172.29.3.4, 44877, None)^M 2018-06-13 10:27:52 ERROR CoarseGrainedExecutorBackend:91 - Executor self-exiting due to : Unable to create executor due to Exception thrown in awaitResult: ^M org.apache.spark.SparkException: Exception thrown in awaitResult: ^M ^Iat org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)^M ^Iat org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)^M ^Iat org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92)^M ^Iat org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:76)^M ^Iat org.apache.spark.storage.BlockManagerMaster.registerBlockManager(BlockManagerMaster.scala:64)^M ^Iat org.apache.spark.storage.BlockManager.initialize(BlockManager.scala:241)^M ^Iat org.apache.spark.executor.Executor.<init>(Executor.scala:116)^M ^Iat org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:83)^M ^Iat org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:117)^M ^Iat org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)^M ^Iat org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)^M ^Iat org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:221)^M ^Iat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)^M ^Iat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)^M ^Iat java.lang.Thread.run(Thread.java:748)^M Caused by: java.lang.RuntimeException: java.io.InvalidClassException: org.apache.spark.storage.BlockManagerId; local class incompatible: stream classdesc serialVersionUID = 6155820641931972169, local class serialVersionUID = -3720498261147521051^M ^Iat java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:687)^M ^Iat java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1880)^M ^Iat java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1746)^M

To avoid that Docker has to build without it's cache, but only if you have build for an older version in the past...

The second problem was that the spark container is pushed, but the spark-py container wasn't yet. This was just forgotten in the initial PR.

(A third problem I also ran into because I had an older docker was https://github.com/apache/spark/pull/21551 so I have not included a fix for that in this ticket.)

Other than that it works great!

Solution

I've added an extra flag so it's possible to call build with `-n` for --no-cache`.

And I've added the extra push for the spark-py container.

Example

./bin/docker-image-tool.sh -r docker.io/myrepo -t v2.3.0 -n build

Snippet from the help output:

Options:
-f file Dockerfile to build for JVM based Jobs. By default builds the Dockerfile shipped with Spark.
-p file Dockerfile with Python baked in. By default builds the Dockerfile shipped with Spark.
-r repo Repository address.
-t tag Tag to apply to the built image, or to identify the image to be pushed.
-m Use minikube's Docker daemon.
-n Build docker image with --no-cache

Attachments

Issue Links

links to

[Github] Pull Request #21555 (rayburgemeestre)

Activity

People

Assignee:: Unassigned

Reporter:: Ray Burgemeestre

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 13/Jun/18 11:25

Updated:: 17/May/20 18:25

Resolved:: 21/Jun/18 00:11