Details
-
Umbrella
-
Status: Resolved
-
Blocker
-
Resolution: Done
-
3.4.0
-
None
Description
There are five problems to address.
First, the scheduled jobs are broken as below:
https://github.com/apache/spark/actions/runs/2513261706
https://github.com/apache/spark/actions/runs/2512750310
https://github.com/apache/spark/actions/runs/2509238648
https://github.com/apache/spark/actions/runs/2508246903
https://github.com/apache/spark/actions/runs/2507327914
https://github.com/apache/spark/actions/runs/2506654808
https://github.com/apache/spark/actions/runs/2506143939
https://github.com/apache/spark/actions/runs/2502449498
https://github.com/apache/spark/actions/runs/2501400490
https://github.com/apache/spark/actions/runs/2500407628
https://github.com/apache/spark/actions/runs/2499722093
https://github.com/apache/spark/actions/runs/2499196539
https://github.com/apache/spark/actions/runs/2496544415
https://github.com/apache/spark/actions/runs/2495444227
https://github.com/apache/spark/actions/runs/2493402272
https://github.com/apache/spark/actions/runs/2492759618
https://github.com/apache/spark/actions/runs/2492227816
See also https://github.com/apache/spark/pull/36899 or https://github.com/apache/spark/pull/36890
In the master branch, seems like at least Hadoop 2 build is broken currently.
Second, it is very difficult to navigate scheduled jobs now. We should use https://github.com/apache/spark/actions/workflows/build_and_test.yml?query=event%3Aschedule link and manually search one by one.
Since GitHub added the feature to import other workflow, we should leverage this feature, see also https://github.com/apache/spark/blob/master/.github/workflows/build_and_test_ansi.yml and https://docs.github.com/en/actions/using-workflows/reusing-workflows. Once we can separate them, it will be defined as a separate workflow.
Namely, each scheduled job should be classified under "All workflows" at https://github.com/apache/spark/actions so other developers can easily track them.
Third, we should set the scheduled jobs for branch-3.3, see also https://github.com/apache/spark/blob/master/.github/workflows/build_and_test.yml#L78-L83 for branch-3.2 job.
Forth, we should improve duplicated test skipping logic. See also https://github.com/apache/spark/pull/36413#issuecomment-1157205469 and https://github.com/apache/spark/pull/36888
Fifth, we should probably replace the base image (https://github.com/apache/spark/blob/master/.github/workflows/build_and_test.yml#L302, https://github.com/dongjoon-hyun/ApacheSparkGitHubActionImage) to plain ubunto image w/ Docker image cache. See also https://github.com/docker/build-push-action/blob/master/docs/advanced/cache.md
Attachments
Issue Links
- is related to
-
SPARK-39609 PySpark need to support pypy3.8 to avoid "No module named '_pickle"
-
- Open
-
-
SPARK-39573 Recover PySpark jobs for branch-3.2 w/ Scala 2.13
-
- Open
-