Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
1.1.1, 2.0.2-alpha
-
None
-
Reviewed
Description
This issue is similar to HADOOP-7153. It was found when using MRUnit - see MRUNIT-158, specifically https://issues.apache.org/jira/browse/MRUNIT-158?focusedCommentId=13501985&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13501985
–
o.a.h.io.SortedMapWritable implements the java.util.Map interface, however it does not define an implementation of the equals() or hashCode() methods; instead the default implementations in java.lang.Object are used.
This violates the contract of the Map interface which defines different behaviour for equals() and hashCode() than Object does. More information here: http://download.oracle.com/javase/6/docs/api/java/util/Map.html#equals(java.lang.Object)
The practical consequence is that SortedMapWritables containing equal entries cannot be compared properly. We were bitten by this when trying to write an MRUnit test for a Mapper that outputs MapWritables; the MRUnit driver cannot test the equality of the expected and actual MapWritable objects.
Attachments
Attachments
- hadoop-9124-branch1.patch
- 4 kB
- Karthik Kambatla
- HADOOP-9124.patch
- 0.6 kB
- Surenkumar Nihalani
- HADOOP-9124.patch
- 4 kB
- Surenkumar Nihalani
- HADOOP-9124.patch
- 4 kB
- Surenkumar Nihalani
- HADOOP-9124.patch
- 5 kB
- Surenkumar Nihalani
- HADOOP-9124.patch
- 4 kB
- Surenkumar Nihalani
- HADOOP-9124.patch
- 5 kB
- Surenkumar Nihalani
- HADOOP-9124.patch
- 5 kB
- Surenkumar Nihalani
- HADOOP-9124.patch
- 4 kB
- Surenkumar Nihalani
- HADOOP-9124.patch
- 5 kB
- Surenkumar Nihalani
- HADOOP-9124.patch
- 5 kB
- Thomas White
Issue Links
- blocks
-
HADOOP-9154 SortedMapWritable#putAll() doesn't add key/value classes to the map
- Closed
- is related to
-
HADOOP-7153 MapWritable violates contract of Map interface for equals() and hashCode()
- Closed
- relates to
-
MRUNIT-158 withOutput doesn't generate useful information with MapWritable on error
- Resolved
Activity
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12561403/HADOOP-9124.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
-1 tests included. The patch doesn't appear to include any new or modified tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1905//testReport/
Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1905//console
This message is automatically generated.
Hi Surenkumar,
Thanks for working on this. The patch looks good - it would be great to have an associated unit test too.
Hey Karthik,
sure. I'll add code that will test equals and hashCode.
Do I add testcases in hadoop-trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/TestSortedMapWritable.java or I create another test file?
sorry for noob question. This is my first contribution.
Suren, Having all the tests corresponding to SortedMapWritable in TestSortedMapWritable is more convenient, easy to find.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12561459/HADOOP-9124.patch
against trunk revision .
-1 patch. The patch command could not apply the patch.
Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1908//console
This message is automatically generated.
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12561462/HADOOP-9124.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 1 new or modified test files.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1909//testReport/
Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1909//console
This message is automatically generated.
Thanks Suren for the updated patch. It looks mostly good, but for some nits.
- It is nicer to import only used methods in Assert and not Assert.*
- Tests should have timeouts
- The failure reasons for equals and hashcode should be different - should a test fail, the message should be clear enough.
- assertFalse(condition) can be used in places with assertTrue(!condition)
- Though basic null checks might not be required; if we are checking for mapA not null, we should check for mapB also to be not null
Can you clarify on fifth point? I seem to have checked for (not null) on both maps. This patch should address the rest of the points.
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12561963/HADOOP-9124.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 1 new or modified test files.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1915//testReport/
Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1915//console
This message is automatically generated.
Thanks Suren - the fifth point seems to have already been addressed.
Few more nits (hopefully the last ):
- We can be a little conservative on the test timeouts - they shouldn't timeout just because we run them on a slow/loaded machine. Do you think 200 ms is conservative enough? If not, we can may be bump it up to 1s.
- The indentation seems to be off on a few lines.
So, the sole purpose of timeout is to have a timeout/sanity? there is not much logic behind the timeout value, right?
It's just enough that method completes?
Yes. The timeout should be high enough that we don't hit on expected runs of the test even on slow systems, and low enough to bail out early if there is something terribly wrong going on with the test.
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12562018/HADOOP-9124.patch
against trunk revision .
-1 patch. The patch command could not apply the patch.
Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1916//console
This message is automatically generated.
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12562134/HADOOP-9124.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 1 new or modified test files.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1921//testReport/
Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1921//console
This message is automatically generated.
Hi Suren,
In HADOOP-9154, I propose moving implementation of Map methods from (Sorted)MapWritable to AbstractMapWritable. The preliminary patch addresses this issue as well. Do you want to work together on that patch to add the required tests and get it in.
I am also very much open to getting this in first, and finishing the other JIRA.
Thanks
Karthik
Hey Karthik,
I would love to work together. I am continuing the discussion on HADOOP-9154. There is no point on getting this in first if HADOOP-9154 already addresses it.
Thanks,
Suren.
The implementation of equals will work, but it doesn't optimize for the case when an object is being compared to itself by checking with ==. HADOOP-7153 solved the same problem for MapWritable so it probably makes sense to use the same fix.
The following assertion in the test is not guaranteed by the contract for hashCode. It might happen to pass, but we should not have it as a part of the test.
+ failureReason = "Two SortedMapWritables with different data have same hashCode"; + assertFalse(failureReason, mapA.hashCode() == mapB.hashCode());
On second thought, I think #equals() should verify if classToIdMap.keySet() and idToClassMap.values() should match for the two objects being compared. No?
tomwhite, I agree about that test. That case is neither likely to happen nor useful.
kkambatl, I am not sure of the purpose of classToIdMap and idToClassMap . Can you elaborate?
I'll update and upload the patch once Karthik's point is clear.
As I understand, both SortedMapWritable and MapWritable keep track of the classes being used as keys and values to facilitate writing/reading to/from streams (disk).
If we are comparing mapA and mapB, I am not certain if the bytes written to disk through #write() need to be identical. If they are supposed to be, then we need to compare classToIdMap and idToClassMap. tomwhite, what do you think?
I was looking at the equals(Object o) implementation of AbstractMap here. It iterates through and checks if each Key and Value mapping is in the map to checked.
Now, each Key and Value would rely on instanceof to check if the class is there. So, I believe we should be good.
I was looking at AbstractMapWritable, the class id is just an auto incremented byte. So, different *MapWritables have added Writables in different order won't lead to same class-id mappings. Also, After a mapping is removed, there is no reference counting to remove the class-id mapping that is no longer needed. Hence, we should not check classToIdMap & idToClassMap
Let me know if I missed any edge case or anything.
So, different *MapWritables have added Writables in different order won't lead to same class-id mappings. Also, After a mapping is removed, there is no reference counting to remove the class-id mapping that is no longer needed. Hence, we should not check classToIdMap & idToClassMap
I agree. According to http://docs.oracle.com/javase/6/docs/api/java/util/Map.html#equals(java.lang.Object)
two maps m1 and m2 represent the same mappings if m1.entrySet().equals(m2.entrySet())
So only the values of the entry set should be used for the basis for testing equality.
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12566787/HADOOP-9124.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 1 new or modified test files.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/2103//testReport/
Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/2103//console
This message is automatically generated.
Thanks for the clarification, Tom. Suren - the patch looks good to me. Thanks.
+1
snihalani, looks like this applies to branch-1 as well.
I didn't know about branches in any of the HowToContribute wiki pages. Where can I read up on them?
What about my comment above about mirroring the change from HADOOP-7153?
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12566969/HADOOP-9124.patch
against trunk revision .
-1 patch. The patch command could not apply the patch.
Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/2112//console
This message is automatically generated.
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12566975/HADOOP-9124.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 1 new or modified test files.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/2113//testReport/
Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/2113//console
This message is automatically generated.
Thanks Suren. I've attached a patch with a modification to SortedMapWritable's equals method so that it is similar to the one in MapWritable. Does this look OK to you?
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12567368/HADOOP-9124.patch
against trunk revision .
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 1 new or modified test files.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 eclipse:eclipse. The patch built with eclipse:eclipse.
+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.
+1 contrib tests. The patch passed contrib unit tests.
Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/2125//testReport/
Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/2125//console
This message is automatically generated.
I just committed this. Thanks Suren for the contribution, and Karthik for the reviews.
Integrated in Hadoop-trunk-Commit #3310 (See https://builds.apache.org/job/Hadoop-trunk-Commit/3310/)
HADOOP-9124. SortedMapWritable violates contract of Map interface for equals() and hashCode(). Contributed by Surenkumar Nihalani (Revision 1441475)
Result = SUCCESS
tomwhite : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1441475
Files :
- /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
- /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/SortedMapWritable.java
- /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/TestSortedMapWritable.java
Thanks Tom for committing this.
Suren - sorry for the delay. branch-1 is the dev branch for 1.x.y releases. Similarly, branch-2 is for 2.x.y. The directory structure for trunk, branch-2 and branch-0.23 is the same, but branch-1 is different. Hence, the need for a separate branch-1 patch.
I have copied the changes from here verbatim to branch-1, verified the test passes. Suren - are you OK with this?
Tom - can you commit this to branch-1?
There is a very small issue:
My latest attachment had this:
+ // entrySets are now same + failureReason = "Two SortedMapWritables with same entry sets formed in different order have different hashcode"; + assertEquals(failureReason, mapA.hashCode(), mapB.hashCode()); + failureReason = "Two SortedMapWritables with same entry sets formed in different order are no longer equal"; + assertEquals(failureReason, mapA, mapB); + assertEquals(failureReason, mapB, mapA);
while Karthik and Tom, your latest attachments have this,
+ // entrySets are now same + failureReason = "Two SortedMapWritables with same entry sets formed in different order are now different"; + assertEquals(failureReason, mapA.hashCode(), mapB.hashCode()); + assertTrue(failureReason, mapA.equals(mapB)); + assertTrue(failureReason, mapB.equals(mapA));
I don't know how branches end up merging, but, since kkambatl mentioned it during code review, so, I thought it was worth to point out.
snihalani, now that the patch is committed to trunk and branch-0.23, I think it would be best not to change it.
Integrated in Hadoop-Yarn-trunk #115 (See https://builds.apache.org/job/Hadoop-Yarn-trunk/115/)
HADOOP-9124. SortedMapWritable violates contract of Map interface for equals() and hashCode(). Contributed by Surenkumar Nihalani (Revision 1441475)
Result = SUCCESS
tomwhite : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1441475
Files :
- /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
- /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/SortedMapWritable.java
- /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/TestSortedMapWritable.java
Integrated in Hadoop-Hdfs-0.23-Build #513 (See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/513/)
HADOOP-9124. SortedMapWritable violates contract of Map interface for equals() and hashCode(). (Surenkumar Nihalani via tgraves) (Revision 1441575)
Result = SUCCESS
tgraves : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1441575
Files :
- /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/CHANGES.txt
- /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/SortedMapWritable.java
- /hadoop/common/branches/branch-0.23/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/TestSortedMapWritable.java
Integrated in Hadoop-Hdfs-trunk #1304 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1304/)
HADOOP-9124. SortedMapWritable violates contract of Map interface for equals() and hashCode(). Contributed by Surenkumar Nihalani (Revision 1441475)
Result = SUCCESS
tomwhite : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1441475
Files :
- /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
- /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/SortedMapWritable.java
- /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/TestSortedMapWritable.java
Integrated in Hadoop-Mapreduce-trunk #1332 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1332/)
HADOOP-9124. SortedMapWritable violates contract of Map interface for equals() and hashCode(). Contributed by Surenkumar Nihalani (Revision 1441475)
Result = SUCCESS
tomwhite : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1441475
Files :
- /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
- /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/SortedMapWritable.java
- /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/TestSortedMapWritable.java
SortedMapWritable uses a TreeMap internally. TreeMap inherits the needed implementation of equals() and hashCode() from AbstractMap. I am planning to write corresponding wrapper methods. Is there a better way to implement this?