Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
Description
The e2e test 'Test_With_Spark_Jobs' waits in a row for the 3 Spark applications to reach the 'Running' state, which is incorrect. We can’t ensure the jobs are still in running by the time we perform the check.
We should check spark driver pod state through KubeCtl Client instead of YuniKorn’s RestClient because the application will be removed from the core after it has completed.
Link of code: test/e2e/spark_jobs_scheduling/spark_jobs_scheduling_test.go#L147-L149
Failed e2e test link: https://github.com/apache/yunikorn-k8shim/actions/runs/6596046649/job/17926552721#step:5:2098
Failed e2e test log analysis:
- 17:18:09Z Pod for app spark-e27dd9a2140844828fdfb3d80e9fa1b4 created
- 17:18:11.725869Z (PodEvent in Log) PodEvent ‘Scheduling’ received
- 17:18:11.727811Z (PodEvent in Log) PodEvent ‘Scheduled’ received
- 17:18:11.735646Z (PodEvent in Log) PodEvent ‘PodBindSuccessful’ received
- 17:20:10.965501Z (PodEvent in Log) PodEvent ‘TaskCompleted’ received
(Complete before check.) - 17:20:20.159 (Ginkgo) Waiting for application spark-e27dd9a2140844828fdfb3d80e9fa1b4 to Running
- 17:26:25.9749 (Ginkgo) timeout
Attachments
Issue Links
- links to