i can not recognize your solution.
when i simply put a line in getFetchItem() method in FetchItemQueue class, see that there are impoliteness requests to same host:
it = queue.remove(0);
we can multiply minCrawlDelay or crawlDelay and maxThreads with number of map tasks but there is no coordination between tasks and also there are not equal number of url from each host for each task.
also i found a bug in selector reduce task in generate phase, that result from less of coordination between tasks.
for these problems i use a redis-server that is a fast data server for manintaining (key,value) pairs.
so, redis maintain some variables like delay, maxThreads,... for each host and can dynamically set them acording to rate of success and block for each host.