[YUNIKORN-2067] Test_With_Spark_Jobs e2e test wait for app state Running after Spark job completed - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.4.0
Component/s: test - e2e
Labels:
- pull-request-available

Target Version:

1.4.0

Description

The e2e test 'Test_With_Spark_Jobs' waits in a row for the 3 Spark applications to reach the 'Running' state, which is incorrect. We can’t ensure the jobs are still in running by the time we perform the check.

We should check spark driver pod state through KubeCtl Client instead of YuniKorn’s RestClient because the application will be removed from the core after it has completed.

Link of code: test/e2e/spark_jobs_scheduling/spark_jobs_scheduling_test.go#L147-L149
Failed e2e test link: https://github.com/apache/yunikorn-k8shim/actions/runs/6596046649/job/17926552721#step:5:2098

Failed e2e test log analysis:

17:18:09Z Pod for app spark-e27dd9a2140844828fdfb3d80e9fa1b4 created
17:18:11.725869Z (PodEvent in Log) PodEvent ‘Scheduling’ received
17:18:11.727811Z (PodEvent in Log) PodEvent ‘Scheduled’ received
17:18:11.735646Z (PodEvent in Log) PodEvent ‘PodBindSuccessful’ received
17:20:10.965501Z (PodEvent in Log) PodEvent ‘TaskCompleted’ received
(Complete before check.)
17:20:20.159 (Ginkgo) Waiting for application spark-e27dd9a2140844828fdfb3d80e9fa1b4 to Running
17:26:25.9749 (Ginkgo) timeout

Attachments

Issue Links

links to

GitHub Pull Request #696

Activity

People

Assignee:: Yu-Lin Chen

Reporter:: Yu-Lin Chen

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 22/Oct/23 14:47

Updated:: 20/Nov/23 16:42

Resolved:: 26/Oct/23 16:21