The ValuesIterator.next() doesn't return a new object representing the next element.
It is expected that this is the case. Example from java.lang.String source code:
Changing the private fields of the object and returning it from next() will make this object fail the equals() test, because this==anObject will always be true.
The reduce() method presents (a subclass of) ValuesIterator to the user. So every user-defined class can be affected.
Manifestations of this bug in 0.17.0:
- The Text class checks for equality similarly to String.equals(), which is shown above.
- The contrib/data_join breaks because it stores tags in a Map. The behavior of the next() method makes Object.equals() be true for all tags.
Patch against hadoop-0.17.0:
I would imagine that a programmer would be really confused when everything is equal in the example below, for any text input:
- Return a new object each time for next(). This might have significant overhead.
- Return a new object for unknown object types and reuse the same object for known types (like Text). Remove "if (this==anObject) return true;" check from all equals() methods for known objects.
- Document clearly that all user-defined classes must implement an equals() method, which doesn't do the "if (this==anObject) return true;" check (ie. push the problem to the user).