We rewrote DistributedQueue in
SOLR-6760, to optimize its obvious use case as a FIFO. But in doing so, we broke the assumptions in OverseerTaskQueue.peekTopN()..
OverseerTaskQueue.peekTopN() involves filtering out items you're already working on, it's trying to peek for new items in the queue beyond what you already know about. But DistributedQueue (being designed as a FIFO) doesn't know about the filtering; as long as it has any items in-memory it just keeps returning those over and over without ever pulling new data from ZK. This is true even if the watcher has fired and marked the state as dirty. So OverseerTaskQueue gets into a state where it can never read new items in ZK because DQ keeps returning the same items that it has marked as in-progress.