[SPARK-25743] New executors are not launched for kubernetes spark thrift on deleting existing executors - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Incomplete
Affects Version/s: 2.2.0
Fix Version/s: None
Component/s: Kubernetes, Spark Core
Labels:
- bulk-closed
Environment:

Hide

Physical lab configurations.

8 baremetal servers,
Each 56 Cores, 384GB RAM, RHEL 7.4
Kernel : 3.10.0-862.9.1.el7.x86_64
redhat-release-server.x86_64 7.4-18.el7

Kubernetes info:

Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.2", GitCommit:"81753b10df112992bf51bbc2c2f85208aad78335", GitTreeState:"clean", BuildDate:"2018-04-27T09:22:21Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.2", GitCommit:"81753b10df112992bf51bbc2c2f85208aad78335", GitTreeState:"clean", BuildDate:"2018-04-27T09:10:24Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}

Show
Physical lab configurations. 8 baremetal servers, Each 56 Cores, 384GB RAM, RHEL 7.4 Kernel : 3.10.0-862.9.1.el7.x86_64 redhat-release-server.x86_64 7.4-18.el7 Kubernetes info: Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.2", GitCommit:"81753b10df112992bf51bbc2c2f85208aad78335", GitTreeState:"clean", BuildDate:"2018-04-27T09:22:21Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.2", GitCommit:"81753b10df112992bf51bbc2c2f85208aad78335", GitTreeState:"clean", BuildDate:"2018-04-27T09:10:24Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}

Description

Launched spark thrift in kubernetes cluster with dynamic allocation enabled.

Configurations set :

spark.executor.memory=35g
spark.executor.cores=8
spark.dynamicAllocation.enabled=true
spark.dynamicAllocation.executorIdleTimeout=10
spark.dynamicAllocation.cachedExecutorIdleTimeout=15
spark.driver.memory=10g
spark.driver.cores=4
spark.sql.crossJoin.enabled=true
spark.sql.starJoinOptimization=true
spark.sql.codegen=true
spark.rpc.numRetries=5
spark.rpc.retry.wait=5
spark.sql.broadcastTimeout=1200
spark.network.timeout=1800
spark.dynamicAllocation.maxExecutors=15
spark.kubernetes.allocation.batch.size=2
spark.kubernetes.allocation.batch.delay=9
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.kubernetes.node.selector.is_control=false

Tried to run TPCDS queries , on a 1TB parquet snappy data .

Found that as the execution progress, the tasks are done by a single executor ( executor 53 ) and no new executors are getting spawned, even though there is enough resources to spawn more executors.

Tried to manually delete the executor pod 53 and saw that no new executor has been spawned to replace the one which is running.

Attcahed the

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

query_0_correct.sql
16/Oct/18 11:13
159 kB
neenu
driver.log
16/Oct/18 11:13
7.99 MB
neenu

Activity

People

Assignee:: Unassigned

Reporter:: neenu

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 16/Oct/18 11:12

Updated:: 25/May/21 01:54

Resolved:: 25/May/21 01:38