Xuan Gong Thanks for reviewing.
on the latest patch, looks like you change the logic for
The logic of giving resources to be localized is actually changed.
Previously, LocalizedRunner does not give the next resource to ContainerLocalizer until the previous has been downloaded.
In this patch, LocalizedRunner will not wait for the previous resource to be downloaded. ContainerLocalizer can handle that by submitting the download task to its CompletionService, which is able to queue those tasks, before executing them. The download thread pool of the CompletionService remains a single thread executor.
Therefore, it is possible that ContainerLocalizer sends multiple LocalResourceStatus to LocalizerRunner through heartbeat. In this case, I think we should try to find the next resources to be localized even when getting FETCH_PENDING.
I have tested it on a real cluster. I specified a large archive which should take a long time to be localized. The result shows they were getting localized serially, and one heartbeat contained multiple statuses of small files (thus reducing the number of heartbeat).
Could you fix this format
My bad, I will fix this.