Issue Details (XML | Word | Printable)

Key: COLLECTIONS-266
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Minor Minor
Assignee: Unassigned
Reporter: Julien Buret
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Commons Collections

Issue with MultiKey when serialized/deserialized via RMI

Created: 11/Sep/07 09:02 PM   Updated: 30/May/08 06:24 AM
Return to search
Component/s: KeyValue
Affects Version/s: 3.2
Fix Version/s: 4.0

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works CC-266-final.patch 2008-05-13 11:01 AM Joerg Schaible 4 kB
Text File Licensed for inclusion in ASF works COLLECTIONS-266.patch 2007-09-12 05:21 PM Henri Yandell 2 kB
File Licensed for inclusion in ASF works collections-testcase-266.diff 2007-09-19 08:10 AM Joerg Schaible 3 kB
Java Source File Licensed for inclusion in ASF works MultiKey.java 2007-09-11 09:04 PM Julien Buret 9 kB
Java Source File Licensed for inclusion in ASF works TestCollections266.java 2007-09-13 11:16 PM Henri Yandell 2 kB
Java Source File Licensed for inclusion in ASF works TestCollections266.java 2007-09-13 07:56 AM Julien Buret 3 kB
Java Source File Licensed for inclusion in ASF works TestCollections266.java 2007-09-12 11:44 PM Henri Yandell 2 kB

Resolution Date: 30/May/08 06:24 AM


 Description  « Hide
This is because the hash code of MultiKey is calculated only once.

So if the MultiKey is deserialized in an other jvm, and if one at least of the subkeys defines its hash code with System.identityHashCode() (for example all the enums does), then the hash code of the MultiKey is no longer valid, and you can't retreive the key in your Map.

I fixed it by making the cached hash code field transient, and by recalculating the hash code during deserialization.



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Julien Buret added a comment - 11/Sep/07 09:04 PM
Here is the updated source file.
Hope this help.

Henri Yandell added a comment - 12/Sep/07 05:21 PM
Julien's fix as a patch.

Henri Yandell added a comment - 12/Sep/07 05:24 PM
The patch seems good to me. Need to make a unit test and then apply.

Henri Yandell added a comment - 12/Sep/07 11:44 PM
While creating the test I realized that the use case that found this problem seems quite rare.
  • MultiKey goes into Map.
  • Map gets sent through serialize/deserialize.
  • We have a new map, with a new multikey inside, with a new object inside that, and the multikey has based its hashCode on the old address of the object and not the new one.

If you dig that object out of the map, and then use it to try and get itself back out of the map; then you've got a problem. So definitely a bug of sorts.

But how did you get access to the object in the first place?

In my unit test, I transfer the Map, then have to get the key back out of the map to then use in a get request. ie:

MultiKey mk2 = (MultiKey) map2.keySet().iterator().next();
assertEquals(TWO, map2.get(mk2));

I find that the test passes for both the old code and your new code. Any idea what I'm doing differently?


Henri Yandell added a comment - 12/Sep/07 11:44 PM
Attaching the unit test.

Julien Buret added a comment - 13/Sep/07 07:56 AM
I have updated the test and now it fails before the patch and is successful after.

In the test, the hash code of KEY_266 depends of the current (simulated) JVM (like System.identityHashCode() ).

HTH


Henri Yandell added a comment - 13/Sep/07 11:15 PM
Thanks Julien.

Digging into it, I was a bit confused by the isJVM1 flag as it makes both the deserialized object and the TEST_266 object have the same hashCode. Then I realized that's probably how enums work, so fits your use case above.

I think this is a special case of a bigger and simpler unit test that uses the natural hashCode of the object (ie: same as System.identityHashCode). The current patch fails for that unit test. I'll attach the test for your thoughts.


Joerg Schaible added a comment - 14/Sep/07 06:14 AM
IMHO the key simply violates the contract. Extract from Javadoc to Object.hashCode:
  • If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result

It also states:

  • Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.

Without testing it, but if you use this key not as part of a MultiKey, but of a HashMap directly, you might get the same results.


Julien Buret added a comment - 14/Sep/07 07:20 AM - edited
For Joerg: Here is the code of equals() and hashCode() methods of class Enum in the sun 1.5 jvm:

public final boolean equals(Object other) { return this==other; }

public final int hashCode() { return System.identityHashCode(this); }

I think (and I hope ) that the class Enum does not violate the hashCode contract - but you can see that the same enum will not have the same hashCode in two different jvms. The conclusion is : never serialize the hashCode (at least for a modular class like MultiKey).

And the HashMap will work fine in this case, because in its writeObject() and readObject() methods, the hashCode of each key is not serialized/deserialized: only the key, the value and the size of the map are serialized: It works, I have tested it.

Sorry for the multiple edits, but what I would like is to underline this sentence of the hashCode contract : "This integer need not remain consistent from one execution of an application to another execution of the same application."


Joerg Schaible added a comment - 14/Sep/07 08:51 AM
Well, this problem with Enums has a history:

However, in the end you're right and the hashCode should not have been stored in the MultiKey in this way. We might solve this by adding a readResolve method:

private Object readResolve() { return new MultiKey(keys, false); }

that way we create a new MultiKey with the correct hashCode. Your solution with the transient member will break the serialization compatibility, since you changed the binary layout. Therefore the hashCode member must be serialized - otherwise you have to change also the serialVersionUID. But with a private calculateHashCode method and setting the hashCode member not to final, we can implement readResolve different:

private Object readResolve() { calculateHashCode(); return this; }

But our clirr report may still choke about the final.

  • Jörg

Julien Buret added a comment - 14/Sep/07 09:49 AM
If the final modifier is a problem, an other solution could be to add a transient field "hashCode2" and no longer use the old field "hashCode " in the class (just keep it for compatibility).

There is no mention of the final keyword in the serialization spec about compatible or incompatible changes:

http://java.sun.com/javase/6/docs/platform/serialization/spec/version.html#5172


Joerg Schaible added a comment - 14/Sep/07 09:57 AM
Ah, well, the "final" modifier was meant as problem for binary compatibility itself, not for binary serialization compatibility

Julien Buret added a comment - 14/Sep/07 10:12 AM

Stephen Colebourne added a comment - 17/Sep/07 08:30 AM
Serialization is actually quite clever. You can change a field to transient, and keep the same serialVersionUID without a problem IIRC. And in this case, it doesn't matter if the serialVersionUID is changed, as the current code is broken.

Henri Yandell added a comment - 18/Sep/07 06:56 PM
So... summarizing:
  • We want it to remain broken for normal natural hashCodes, as not keeping those in line with the spec is broken.
  • We want to fix it for enums though, as they are special - and Julien's test case is good there because it models the specialness with the isJVM1 flag.
  • Use Julien's fix because the move to transient doesn't break compat.

Stephen/Joerg???


Joerg Schaible added a comment - 19/Sep/07 08:10 AM
Patch to improve test case by simulating Enum behaviour.

Joerg Schaible added a comment - 19/Sep/07 08:21 AM
1/ You cannot fix natural hash codes in general. It works for Enums since they use always the same instance in the same VM.
2/ I've added a test case that does something similar ... I missed Julian's TC, but yes, that test would be sufficient also
3/ I had complaints against adding the "transient". I can run the TestAllPackages suite though and I assume (although I did not find where) that it also contains compatibility tests for serialization, since there are such objects in the data/test directory. This would prove Stephen's comment right that Java serialization can deal with the situation - at least in one direction. But I doubt it will work in the other direction i.e. an old version of CC can read such a serialized object. Therefore I'd simply remove the final. And IMHO it matters if the serialVersionUID changes, since the current code is only broken for a special use case

Henri Yandell added a comment - 13/May/08 07:22 AM
Joerg - do you remember enough of this issue to put together a patch for your alternative fix?

Joerg Schaible added a comment - 13/May/08 11:01 AM
I've added a new patch that combines all changes for the main source and the test case.

Henri Yandell added a comment - 30/May/08 06:24 AM
svn ci -m "Applying Joerg's final patch from COLLECTIONS-266, including the unit test that shows the problem and fixes the problem by making the hashcode transient, and moving the hashcode implementation such that it can be called from the deserialization as well as the hashcode method" src

Sending src/java/org/apache/commons/collections/keyvalue/MultiKey.java
Sending src/test/org/apache/commons/collections/keyvalue/TestMultiKey.java
Transmitting file data ..
Committed revision 661577.