This build was hung: http://build-us-00.elastic.co/job/es_core_15_centos/230/testReport/junit/org.elasticsearch.index.engine/InternalEngineTests/testDeletesAloneCanTriggerRefresh/
Only one thread was stalled in DocumentsWriterStallControl, which means we have a bug somewhere, because that thread should have un-stalled once the other (too many) threads finished flushing their segments.
I think we should make a simple defensive change here: instead of wait(), which waits forever for a .notify/All() to wake it up, we should wait for up to a time limit. This way when any concurrency bug like this strikes, we won't hang forever.
I cannot reproduce that particular hang... what's unique about that test is it uses a positively minuscule (1 KB) IW buffer.