Details
Description
I think I found the reason why the generator returns with an empty fetchlist for small fetchsizes.
After the first job finishes running, the generator checks the following condition to see if it got an empty list:
if (readers == null || readers.length == 0 || !readers[0].next(new
FloatWritable())) {
The third condition is incorrect here. In some cases, esp. for small fetchlists, the first partition might be empty, but some other partition(s) might contain urls. In this case, the Generator is incorrectly assuming that all partitions are empty by just looking at the first. This problem could also occur when all URLs in the fetchlist are from the same host (or from a very small number of hosts, or from a number of hosts that all map to a small number of partitions).