Issue Details (XML | Word | Printable)

Key: JCR-320
Type: Bug Bug
Status: Open Open
Priority: Minor Minor
Assignee: Jukka Zitting
Reporter: Piyush Purang
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Jackrabbit Content Repository

BinaryValue equals fails for two objects with two different byte arrays that contain the same bytes.

Created: 16/Feb/06 03:37 AM   Updated: 14/Feb/08 01:46 PM
Component/s: jackrabbit-core
Affects Version/s: 0.9, 1.0, 1.0.1
Fix Version/s: None

Time Tracking:
Not Specified

Issue Links:
Duplicate
 


 Description  « Hide
http://svn.apache.org/repos/asf/incubator/jackrabbit/trunk/jackrabbit/src/main/java/org/apache/jackrabbit/value/BinaryValue.java


Here is the present implementation

    public boolean equals(Object obj) {
        if (this == obj) {
            return true;
        }
        if (obj instanceof BinaryValue) {
            BinaryValue other = (BinaryValue) obj;
            if (text == other.text && stream == other.stream
                    && streamData == other.streamData) {
                return true;
            }
            // stream, streamData and text are mutually exclusive,
            // i.e. only one of them can be non-null
            if (stream != null) {
                return stream.equals(other.stream);
            } else if (streamData != null) {
                return streamData.equals(other.streamData);
            } else {
                return text.equals(other.text);
            }
        }
        return false;
    }


Here are the problems with the present implementation


1. streamData.equals(other.streamData ) will fail miserably.

2. too many return statements! I guess no one ran a checkstyle on it.

3. return stream.equals(other.stream); will always be false unless both have been created with the same InputStream!

I wrote some testcases in SetValueBinaryTest

    public void testBinaryValueEquals() throws Exception {
        BinaryValue binaryValue1 = null;
        BinaryValue binaryValue2 = null;

        // initialize some binary value
        data = createRandomString(10).getBytes();
        binaryValue1 = (BinaryValue) superuser.getValueFactory().createValue(new ByteArrayInputStream(data));
        binaryValue2 = (BinaryValue) superuser.getValueFactory ().createValue(new ByteArrayInputStream(data));
        //ideallly setup a test harness that tests all the cases as defined by the contract in Object.equals()
        assertTrue( binaryValue1.equals(binaryValue2));
        assertTrue( binaryValue1.equals(binaryValue1));
        assertFalse( binaryValue1.equals(null));
    }


    public void testBinaryValueEquals2() throws Exception {
        String str = createRandomString(10);
        BinaryValue binaryValue1 = new BinaryValue(str.getBytes());
        BinaryValue binaryValue2 = new BinaryValue(str.getBytes());

        assertTrue( binaryValue1.equals(binaryValue2));
        assertTrue( binaryValue1.equals(binaryValue1));
        assertFalse( binaryValue1.equals(null));
    }
    
    public void testBinaryValueEquals3() throws Exception {
        String str1 = "Some string xyz";
        String str2 = new StringBuffer().append("Some string xyz").toString();
        BinaryValue binaryValue1 = new BinaryValue(str1);
        BinaryValue binaryValue2 = new BinaryValue(str2);

        assertTrue( binaryValue1.equals(binaryValue2));
        assertTrue( binaryValue1.equals(binaryValue1));
        assertFalse( binaryValue1.equals(null));
    }


They were written quickly but with the present implementation the first two fail at the very first assert* statement which for stream I can understand (as we are basically propogating InputStream's equals contract) but for byte arrays I can't agree with this behaviour unless it is so intended. It behaves the best when BinaryVlaue wraps a string i.e. testBinaryValueEquals3() passes without trouble.

I propose a new implementation where I am not very convinced with the behaviour when we have streams being wrapped. If it should fail for the streams then we should change the documentation for the method. As for making it work right when streams are involved .. well the streams will have to be read and compared.

 
public boolean equals(Object obj) {
        boolean result = false;
        if (this == obj) {
            result = true;
        } else if (obj instanceof BinaryValue) {
            BinaryValue other = (BinaryValue) obj;
            // only one of text, stream and streamData are set at any given point
            if (text != null) {
                result = text.equals(other.text);
            } else if (stream != null) {
                result = stream.equals(other.stream);
            } else if (streamData != null) {
                result = Arrays.equals(streamData, other.streamData);
            } else {
                // all values are null; test that the other object (which could be us)
                // also has everything set to null!
                result = (other.text == null && other.stream == null
                        && other.streamData == null);
            }
        }
        return result;
    }

This implementation of course fails at the first assert* of the first test method testBinaryValueEquals (It will pass the other two assert*s).

And if this new implementation doesn't fit the bill and an alternative isn't needed then just skip implementing equals.

One more thought running through my mind is - if text, stream and data are mutually exclusive why don't we have different classes for each of them? (Try and wrap a stringValue into a binary value...)

I have also noticed that the hashCodes all return 0's throughtout the package. In the case that the hashCode can't keep up with the contract of equal I would propose throwing an UnsupportedOpertaionException. And if not then declare it in the BasValue as it will be inherited (unless this leads the QA tools to report infringement of the rule that when you define equals you need to redefine hashCode - checkstyle does that).

The value package doesn't have any tests for it.. or did I miss them?

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Jukka Zitting added a comment - 16/Feb/06 07:07 AM
You are right in that the current equals method is incorrect. I believe that the rationale for the current implementation is that instance equality is a good enough test for comparing binary streams. The correct way to implement the method is documented in the JCR Value javadocs:

    http://www.day.com/maven/jsr170/javadocs/jcr-1.0/javax/jcr/Value.html

Can you provide a patch of your improvements and test cases?

Alexandru Popescu added a comment - 16/Feb/06 07:20 AM
What about the hashCode() implementations? IMO returning 0 is equivalent with not having an equals method.

./alex
--
.w( the_mindstorm )p.

Piyush Purang added a comment - 16/Feb/06 05:19 PM
IMO returning 0 as hashCode implies all Values across all types are equal! The value should atleast be different for different Types.

And it also doesn't really satisfy the javadoc comment of "Returns zero to satisfy the Object equals/hashCode contract. This class is mutable and not meant to be used as a hash key." A HashMap with a value as a key will fail rather silently. With an UnsupportedOperationException one will know the error of one's way at once.

From an api design point of view we can dump Value equals and hashcode i.e. let direct comparisons using equals always return false (i.e. Object's implementation holds true) and same for hashCode.

And we could introduce a Values class that could introduce overloaded equals methods (Like Arrays, Collections etc.), we can merge Values and the present ValueHelper class into one class.

And Jukka I will get down to providing you with a patch and testcases. It might take some time though...

Jukka Zitting added a comment - 16/Feb/06 05:42 PM
The current hashCode() implementation is based on a discussion we had in April 2005, see http://thread.gmane.org/gmane.comp.apache.jackrabbit.devel/1561 (starting a few messages down the thread). There are valid arguments for many different implementations, but once we implement equals() the Object contract requires that hashCode() is also implemented at least in some way. Returning a constant was considered the best alternative.

> From an api design point of view we can dump Value equals and hashcode

No, we can't. :-) The JCR API explicitly says that Value.equals() must be implemented in a specific way. See the javadoc link for details.

> And Jukka I will get down to providing you with a patch and testcases. It might take some time though...

Thanks! I'm not decided if this is a serious enough breach of the JCR API to be a critical issue for the 1.0 release. If it is, then I can participate in getting this fixed, otherwise there is no need to hurry.

Marcel Reutegger added a comment - 16/Feb/06 06:55 PM
Piyush wrote:
> IMO returning 0 as hashCode implies all Values across all types are equal! The value should atleast be different for different Types.

this is not true. returning 0 (or any other fixed value) is a valid return value for hashCode(). See contract of Object.hashCode(). In fact the opposite is true: values that are equal implies that their hashCodes are equal!

Objects of different types can return the same hashCode() as long as equals will return false for two instances of different type.

Piyush Purang added a comment - 17/Feb/06 05:42 AM
Marcel you are spot on I erred there in my comment. What I wanted to say was returning different int values would have signaled different types of values.. I actually got carried away there :)

> No, we can't. :-) The JCR API explicitly says that Value.equals() must be implemented in a specific way. See the javadoc link for details.

You are right Jukka, I forgot jackrabbit 0.9 doesn't mean that the JCR API which is already 1.0 is still negotiable... :) and thanks for that discussion/mailing-list-archive link says a lot. So I guess I should stop throwing the hashCode issue around?

Here are some fresh comments on this issue

1. I have started a new package org.apache.jackrabbit.test.api.value with a TestAll and other classes. I hope that is fine.

2. Is there a way of getting the default encoding i.e. BaseValue.DEFAULT_ENCODING? and even more importantly is there a way of setting a Default_Encoding conveniently through out the repository for example what if a project wants to use UTF16 instead of UTF8 ? Has a thought been put into that? Perhaps a config property?

3. I have read the javadoc for Value (The third time mind you). Here are a few cases (to see if I got it right)

a) two Binary Value with texts "some text" according to api they should be equal and by implementation they are.
b) two Binary Values with bytes from "some text" in the same encoding according to api should be equal(?) and with the implementation I proposed they are equal.
what should happen if they have different encodings but the same content? I guess equals should fail.
c) two BinaryValues with input streams from the same file will fail equals if the input streams were instantiated independently. The javadoc asks not to try and compare big blobs so we stick to that.

But what happens when one BinaryValue has text = "some text" and another a byte array = "some text".getBytes(DEFAULT_ENCODING)?

and what happens when one BinaryValue has text = "some text" and another a stream from a file with the contents "some text"?

If we decide that a BV with text="some text" must equal a BV with byte array = "some text".getBytes(DEFAULT_ENCODING) etc. etc. then I need to know now as the present equals implementation and my proposed one fall way short! We also need to write more tests then.. many more

Next question: do we really need such a complex equals method i.e. do the other parts of the API really care?

Another thought - we could simplify things if we didn't have a string in there and just have a byte-array or a stream... as we can always re/create the string from the byte-array.

I hope no one's going to say ... seeesh leave it alone it's just an equals method! ;)


Jukka Zitting added a comment - 17/Feb/06 08:13 AM
I commented on the mailing list. Let's keep the Jira for tightly scoped comments about the issue at hand and use the mailing list for more general discussion. :-)

Jukka Zitting added a comment - 01/Jul/06 02:14 PM
Piyush, did you get around to creating a patch for this? If not, I'll just take your code from the above comments and apply it.