Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-50294

Refactor docker image for testing

    XMLWordPrintableJSON

Details

    • Umbrella
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 4.0.0
    • None
    • Project Infra
    • None

    Description

      currently we only have single testing image (dev/infra/Dockerfile), for jobs pysparksparkrlint and docs, it has two major issues:

      • disk space limitation: we are adding more and more packages in it, the disk space left for testing is very limited, and cause No space left on device from time to time;
      • environment conflicts: for example, even though we already install some packages for docs in the docker file, we still need to install some additional python packages in build_and_test, due to the conflicts between docs and pyspark. It is hard to maintain because the related packages are installed in different places.

       

      so we want to split existing base image to multiple ones, so that:

      • completely cache all the dependencies for each job;
      • centralize related installations for each job;
      • free up disk space on the base image;
      • introduce new dev tools based on new images;

      Attachments

        1.
        Add a separate docker file for doc build Sub-task Resolved Ruifeng Zheng
        2.
        Add a separate docker file for linter Sub-task Resolved Ruifeng Zheng
        3.
        Add a script to build docs with image Sub-task Resolved Pan Bingkun
        4.
        Add a separate docker file for SparkR Sub-task Resolved Ruifeng Zheng
        5.
        Extract the common content of `Dockerfile` from `Docs`, `Linter`, and `SparkR` images Sub-task Open Pan Bingkun
        6.
        Add a separate docker file for python 3.9 daily build Sub-task Resolved Ruifeng Zheng
        7.
        Add a separate docker file for python 3.10 daily build Sub-task Resolved Ruifeng Zheng
        8.
        Add a separate docker file for python 3.12 daily build Sub-task Resolved Ruifeng Zheng
        9.
        Add a separate docker file for python 3.13 daily build Sub-task Resolved Ruifeng Zheng
        10.
        Add a separate docker file for PyPy 3.10 daily build Sub-task Resolved Ruifeng Zheng
        11.
        Add a separate docker file for Python 3.11 daily coverage build Sub-task Resolved Ruifeng Zheng
        12.
        Apply Python 3.11 image in Java 21 daily build Sub-task Resolved Ruifeng Zheng
        13.
        Apply Python 3.11 image in No-ANSI daily build Sub-task Resolved Ruifeng Zheng
        14.
        Apply Python 3.11 image in RocksDB as UI Backend daily build Sub-task Resolved Ruifeng Zheng
        15.
        Apply Python 3.11 image in PR build Sub-task Resolved Ruifeng Zheng
        16.
        Skip uncessary image build and push Sub-task Resolved Ruifeng Zheng
        17.
        Make 3.5 daily build able to manually trigger Sub-task Resolved Ruifeng Zheng
        18.
        Make more daily builds able to manually trigger Sub-task Resolved Ruifeng Zheng
        19.
        Add a daily build for PySpark with old dependencies Sub-task Resolved Ruifeng Zheng
        20.
        Add a daily build for Pandas API on Spark with old dependencies Sub-task Resolved Ruifeng Zheng
        21.
        Make pyspark-pandas module no longer depend on pyspark module Sub-task Resolved Ruifeng Zheng

        Activity

          People

            Unassigned Unassigned
            podongfeng Ruifeng Zheng
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: