The symptom is that OOM errors "unable to create native thread" happens, resulting in replicas that never come up, sometimes never ending recovery cycles etc. etc. etc. Decreasing Xss or increasing Xmx doesn't help (maybe some other settings?). In my testing with 400 replicas in a JVM, something on the order of 1K temporary threads were spun up when I tried to start the JVM. More correctly that many threads were running, the rest (number unknown) never started at all.
What the default should be is certainly debatable. The curious thing was that at no time did the JVM memory appear to be stressed (using jconsole to spot-check, not rigorous at all)....
I've assumed that by ordering the replicas to come up based on what collection they belong to, they won't get stuck waiting for a leader election just because the ordering on instance1 happened to try to bring up collection1 then collection2 whereas instance2 tried to bring them up in reverse order. More like skip-lists in terms of waiting...
Sure, with weird enough topology instance1 could have a long queue to get through before getting to collectionX whereas instance N could start with collectionX and have to go through leader vote wait timeouts, but that's way better than having a cluster that won't start at all.
And when I tested starting 3 of my 4 JVMs, it was indeed painfully slow waiting for leader vote wait timeouts. But the cluster came up.
Using 3 coreLoadThreads and starting all my JVMs at once took just a few minutes.
And the painfully trappy behavior without this patch is that I can create all my collections just fine, but then I can't restart the cluster successfully.