This behavior was observed in a real cluster of hundreds of nodes and hundreds of services.
After a ~30 new nodes startup, we expected node singletons to be deployed immediately, but the services were deployed with ~4hr delay on last nodes. The reason for this is the fact that ALL service deployments are recalculated on ALL discovery events. In this case, a single discovery event was processed in a time span of 2-5 minutes, which with the lateAffinityAssignment=true yielded a 4hr delay for the last node.
The quick change that may improve things a lot is aborting current assignment calculations if there is a pending discovery event needs to be processed.
The rest of the optimizations are put to https://issues.apache.org/jira/browse/IGNITE-5551