Details
-
Bug
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
None
-
None
-
None
-
We run our Trogdor clusters within Kubernetes.
Description
As part of my performance tests, I am running 3000 workloads within Trogdor. The clients seem to be able to handle this fine, but when I go to reset and run the same test again, Trogdor seems sluggish.
Here are the steps to reproduce this:
- Run 3000 workloads in Trogdor, a combination of Produce/Consume workloads.
- Wait for the workloads to complete.
- Run the DELETE API calls to destroy all 3000 workloads to reset for the next run.
- Confirm via the API that there are no workloads defined in the system.
- Run an additional 3000 workloads in Trogdor similar to step 1.
The Coordinator takes a long time to start the second batch of 3000. There seems to be some performance issue in the framework that will take a while to debug. At this point I don't know if it only affects the Coordinator, or if the Agents are affected as well. I do not currently have the time to look into this, so I am creating this issue to track it.
The workaround I am employing is destroying and recreating the Trogdor cluster in between test runs.