Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Now, the experiment service is the most important feature in Apache Submarine. However, the service is not stable and not user-friendly. For example,
(1) The frontend workbench cannot reflect the actual experiment status. (ex: OOM)
(2) The server misses some constraints in Kubernetes Java Client. (ex: If the experiment name contains the character "_", the k8s java API will throw an exception.)
(3) Unexpected out-of-memory error: It is very inconvenient for users to predict the actual memory usage before running the experiment. Thus, using the memory request and memory limit mechanism to allow overcommitment of memory is helpful for users.
(4) Allow users to create experiments with the same name, and they can retrieve these experiments with the name.
(5) Set different tags on experiments to divide them into categories, and thus users can retrieve these experiments with tags.
(6) The K8sSubmitter will submit an experiment to the Kubernetes cluster when it is created, no matter how much resource quota is left.
With these reasons, it is necessary to refactor and stabilize experiment service in submarine-server.