Details
Description
The initial query run by the generator task, against mongodb, doesn't force ordering by _id. This causes an incorrect selection of ranges for successive map-reduce related queries. The successive queries do appear to be getting run in the correct order since _id is always indexed, but they should also explicitly specify a sort, since you are not guaranteed a particular order otherwise. I didn't dig deep enough to see if the root of the problem is with nutch or gora, and whether it only affected mongo or could affect other databases as well.