Details
Description
While running a workload with concurrent writes and deletes, I saw a lot of NotReplicatedYetExceptions being thrown due to late arrivals of blockReceived reports from the DN. Looking at the DN logs, I found that the blockReceived message was being delayed as much as 15 seconds because the OfferService thread was blocked on file system operations processing deletes. We previously moved the deletions to another thread, but it still accesses the file system to determine the block length in the main thread. On a heavily loaded system this can take a long time.