This work item refactors the router shutdown sequence.
On shutdown the router currently simply terminates all threads (but the main thread) mid-process. All state is essentially frozen at the point where the shutdown signal is received. Then the main thread attempts to clean up state before exiting.
This approach is error prone and results in memory leaks (see open JIRAs). In addition it requires a bespoke cleanup handler that essentially duplicates run-time cleanup code (e.g. link close handling, connection close handling, etc).
It would be better to implement a controlled shutdown that leverages the "normal" connection close/management delete code that exists in the router.
For example, the new shutdown process could go something like this:
Add two new attributes to the "router" management entity: adminStatus and operStatus:
adminStatus values: ["up", "down"]
operStatus: ["active", "quiescing", "shutdown"]
adminStatus defaults to "up". When modified either via management or SIGTERM/QUIT/INT/etc) to "down" initiate the shutdown process:
- Set operStatus to quiescing
- Close & delete all listeners and connectors. This will prevent new connections from being established.
- Initiate close of all active connections
- Wait for all connections to complete close and delete
- Join() all I/O threads but the main thread (this leaves the main thread and the core thread).
- Issue a new action to the core thread to cause it to clean up its resources and exit
- Join() the core thread.
- Clean up any remaining server state then exit the main thread.
This is just an example of a possible shutdown sequence. A proper design document should be proposed for review as a first step.
- is a parent of
DISPATCH-2129 shutdown race accessing core->running flag