Description
DeleteDuplicates.HashPartitioner.reduce():
// byScore case
if (value.score > highest.score) {
highest.keep = false;
LOG.debug("-discard " + highest + ", keep " + value);
output.collect(highest.url, highest); // delete highest
highest = value;
}
// !byScore is also similar
So assume two docs with same hash are in values.If the first has higher score than the second than second doc will be deleted. But if the first has lower score than the second then none will be deleted. AFAICS, there should be an else condition to delete value and keep highest as it is.