Thanks a lot for your insight on why intermediated directory is scanned before done directory and potential name node issue, Robert Joseph Evans. That makes a lot of sense. Per offline discussion with Karthik, we'd like to propose three approaches.
1. For web API requests for individual jobs, the intermediate directory is still scanned first, but inside scanIntermediateDir(), we could add checking of existence of the jhst files of the associated job (), and only when the files do exist do we move files in intermediate directory to done directory. The assumption is that file existence is not expensive, and if the files do not exist in intermediate directory, we only acquire the lock on the user directory for a short period of time.
2. For web API requests of individual jobs, when intermediate directory is scanned, check the existence of the job files, and only files of the job associated with the request are moved from intermediate directory to done directory. This reduces the time for which each job web request thread blocks, but may have much smaller overall throughput that the previous approach when file moving is done in batch.
3. Have a dedicated thread to scan the intermediate directory and other threads to wait on a monitor associated with a particular job. When the dedicated thread finishes, threads waiting on the monitors will be notified. By having a single writer, the contention on the user directory lock can be reduced. But it does have the problem of conflicting with clients' expectation as Robert Joseph Evans pointed out in previous comment.
Can you please share some of your thoughts on them, Robert Joseph Evans, Jason Lowe?