I don't believe changing a leaf queue into a parent queue is supported by the CapacityScheduler, just like it doesn't support deleting a queue. These can be accomplished by restarting the RM but at that point we're doing an unrelated queue setup and trying to avoid things that are "hard" to accomplish. If they were easy, we'd just support them as refreshable options rather than requiring a restart. Supporting these kinds of config changes during work-preserving RM restart essentially requires us to tackle them as if we were refreshing, because apps and containers aren't getting wiped off the cluster between the changes. That means we need to hammer out exactly what the semantics are if we don't declare it to be outright wrong to set up the configs like that.
Killing an app when its queue disappears, either by being deleted or by having it suddenly become a parent queue, is a bit severe, especially if it was an accident (e.g.: someone typo'd the queue name in the list of child queues when adding an unrelated queue). However I'm not sure we have a lot of other great options. We could move the application to another queue so it can survive, but then the question is what queue to use. There may not be a default queue and/or the user may not have permissions on any other queue. Or all other queues could already be at max app capacity, etc.
Another option is to put the app in limbo and "pause" it, where it won't get any more resources but we won't kill any outstanding containers. Basically we're waiting for the user to move it themselves so it can progress. But in the interim the accounting is messed up because cluster resources are being consumed by something that isn't in a queue.
So for now, killing it seems to be the path of least resistance if the RM has to survive. Agree with Karthik that the fail-fast config seems appropriate for determining whether the user would like the RM to fail to come up with that config or kill apps to survive.