Uploaded image for project: 'Apache Submarine'
  1. Apache Submarine
  2. SUBMARINE-949

[Umbrella] Refactor and stabilize experiment service in submarine-server

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.6.0
    • Backend Server, experiment
    • None

    Description

      Now, the experiment service is the most important feature in Apache Submarine. However, the service is not stable and not user-friendly. For example, 

      (1) The frontend workbench cannot reflect the actual experiment status. (ex: OOM)

      (2) The server misses some constraints in Kubernetes Java Client. (ex: If the experiment name contains the character "_", the k8s java API will throw an exception.)

      (3) Unexpected out-of-memory error: It is very inconvenient for users to predict the actual memory usage before running the experiment. Thus, using the memory request and memory limit mechanism to allow overcommitment of memory is helpful for users.

      (4) Allow users to create experiments with the same name, and they can retrieve these experiments with the name.

      (5) Set different tags on experiments to divide them into categories, and thus users can retrieve these experiments with tags.

      (6) The K8sSubmitter will submit an experiment to the Kubernetes cluster when it is created, no matter how much resource quota is left.
       

      With these reasons, it is necessary to refactor and stabilize experiment service in submarine-server.

      Attachments

        Activity

          People

            khchen Kai-Hsun Chen
            khchen Kai-Hsun Chen
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: