Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Fix Version/s: 0.8 beta 1
    • Component/s: Core
    • Labels:
      None

      Description

      Unified UUIDType comparator, compares as time-based if both UUIDs are time-based, otherwise uses byte comparison. Based on code from the current LexicalUUIDType and TimeUUIDType comparers, so performance and behavior should be consistent and compatible.

      1. 2233.txt
        11 kB
        Jonathan Ellis
      2. UUIDTypeTest.java
        6 kB
        Ed Anuff
      3. UUIDType.java
        5 kB
        Ed Anuff

        Activity

        Hide
        Hudson added a comment -

        Integrated in Cassandra #841 (See https://hudson.apache.org/hudson/job/Cassandra/841/)

        Show
        Hudson added a comment - Integrated in Cassandra #841 (See https://hudson.apache.org/hudson/job/Cassandra/841/ )
        Hide
        Jonathan Ellis added a comment -

        That was the problem. Committed, thanks!

        Show
        Jonathan Ellis added a comment - That was the problem. Committed, thanks!
        Hide
        Ed Anuff added a comment - - edited

        That test is comparing a time-based and non-timed based UUID, I believe. My version of this was using:

        assertEquals(c, sign(compareUsingJUG(u1, u2)));

        Looks like you changed it to:

        + if (u1.version() == 1)
        + assertEquals(c, TimeUUIDType.instance.compare(bytebuffer(u1), bytebuffer(u2)));

        It needs to be this if you're testing compatibility with TimeUUIDType:

        + if ((u1.version() == 1) && (u2.version() == 1))
        + assertEquals(c, TimeUUIDType.instance.compare(bytebuffer(u1), bytebuffer(u2)));

        FWIW, I saw you pulled the compareUsingJUG() test. The thinking there was to have additional coverage by testing with another comparison implementation. If we want to remove a dependency on JUG, that's fine, and, of course, there's nothing canonical about JUG, except Cassandra is already using it and it has a well thought out and documented comparison function that's compatible with this one.

        Show
        Ed Anuff added a comment - - edited That test is comparing a time-based and non-timed based UUID, I believe. My version of this was using: assertEquals(c, sign(compareUsingJUG(u1, u2))); Looks like you changed it to: + if (u1.version() == 1) + assertEquals(c, TimeUUIDType.instance.compare(bytebuffer(u1), bytebuffer(u2))); It needs to be this if you're testing compatibility with TimeUUIDType: + if ((u1.version() == 1) && (u2.version() == 1)) + assertEquals(c, TimeUUIDType.instance.compare(bytebuffer(u1), bytebuffer(u2))); FWIW, I saw you pulled the compareUsingJUG() test. The thinking there was to have additional coverage by testing with another comparison implementation. If we want to remove a dependency on JUG, that's fine, and, of course, there's nothing canonical about JUG, except Cassandra is already using it and it has a well thought out and documented comparison function that's compatible with this one.
        Hide
        Jonathan Ellis added a comment -

        updated testCompare to use sign() (your method, not mine

        getting non-magnitude related disagreement now:

            [junit] Testcase: testCompare(org.apache.cassandra.db.marshal.UUIDTypeTest):	FAILED
            [junit] expected:<-1> but was:<1>
            [junit] junit.framework.AssertionFailedError: expected:<-1> but was:<1>
            [junit] 	at org.apache.cassandra.db.marshal.UUIDTypeTest.testCompare(UUIDTypeTest.java:154)
            [junit] 	at org.apache.cassandra.db.marshal.UUIDTypeTest.testCompare(UUIDTypeTest.java:49)
            [junit] 	at org.apache.cassandra.db.marshal.UUIDTypeTest.testCompare(UUIDTypeTest.java:154)
        
        Show
        Jonathan Ellis added a comment - updated testCompare to use sign() (your method, not mine getting non-magnitude related disagreement now: [junit] Testcase: testCompare(org.apache.cassandra.db.marshal.UUIDTypeTest): FAILED [junit] expected:<-1> but was:<1> [junit] junit.framework.AssertionFailedError: expected:<-1> but was:<1> [junit] at org.apache.cassandra.db.marshal.UUIDTypeTest.testCompare(UUIDTypeTest.java:154) [junit] at org.apache.cassandra.db.marshal.UUIDTypeTest.testCompare(UUIDTypeTest.java:49) [junit] at org.apache.cassandra.db.marshal.UUIDTypeTest.testCompare(UUIDTypeTest.java:154)
        Hide
        Ed Anuff added a comment - - edited

        Hi Jonathan, thanks for testing it out. It's following the convention of http://download.oracle.com/javase/6/docs/api/java/util/Comparator.html#compare%28T,%20T%29 to return a negative or positive value which is not necessarily -1 or +1. I can normalize it to those values if we want to hold to a tighter standard. What do you suggest?

        Edit: Oops, just saw you added a sign() method to do that already.

        Show
        Ed Anuff added a comment - - edited Hi Jonathan, thanks for testing it out. It's following the convention of http://download.oracle.com/javase/6/docs/api/java/util/Comparator.html#compare%28T,%20T%29 to return a negative or positive value which is not necessarily -1 or +1. I can normalize it to those values if we want to hold to a tighter standard. What do you suggest? Edit: Oops, just saw you added a sign() method to do that already.
        Hide
        Jonathan Ellis added a comment -

        rebased and changed testCompare to only care about version 1 results, and validate against TimeUUIDType, but I'm getting a failure:

            [junit] Testcase: testCompare(org.apache.cassandra.db.marshal.UUIDTypeTest):	FAILED
            [junit] expected:<-1> but was:<-23>
            [junit] junit.framework.AssertionFailedError: expected:<-1> but was:<-23>
            [junit] 	at org.apache.cassandra.db.marshal.UUIDTypeTest.testCompare(UUIDTypeTest.java:155)
            [junit] 	at org.apache.cassandra.db.marshal.UUIDTypeTest.testCompare(UUIDTypeTest.java:30)
        
        
        Show
        Jonathan Ellis added a comment - rebased and changed testCompare to only care about version 1 results, and validate against TimeUUIDType, but I'm getting a failure: [junit] Testcase: testCompare(org.apache.cassandra.db.marshal.UUIDTypeTest): FAILED [junit] expected:<-1> but was:<-23> [junit] junit.framework.AssertionFailedError: expected:<-1> but was:<-23> [junit] at org.apache.cassandra.db.marshal.UUIDTypeTest.testCompare(UUIDTypeTest.java:155) [junit] at org.apache.cassandra.db.marshal.UUIDTypeTest.testCompare(UUIDTypeTest.java:30)
        Hide
        Ed Anuff added a comment -

        Yes, that is correct.

        Show
        Ed Anuff added a comment - Yes, that is correct.
        Hide
        Jonathan Ellis added a comment -

        so UUIDT will compare identically to existing TUUIDT, assuming version 1?

        Show
        Jonathan Ellis added a comment - so UUIDT will compare identically to existing TUUIDT, assuming version 1?
        Hide
        Ed Anuff added a comment - - edited

        Attached new version and test case. It now sorts first by version number, then by time if both are version 1, and then lexically using msb to lsb byte comparison. Turns out this is actually the same way the org.safehaus.uuid.UUID.compareTo() method works, so there's precedent.

        FWIW, the current LexicalUUIDType is using java.util.UUID.compareTo() which, looking at the JDK source and doing some tests, appears not to be a lexical comparison as described in rfc4122 since it's doing a signed comparison.

        Show
        Ed Anuff added a comment - - edited Attached new version and test case. It now sorts first by version number, then by time if both are version 1, and then lexically using msb to lsb byte comparison. Turns out this is actually the same way the org.safehaus.uuid.UUID.compareTo() method works, so there's precedent. FWIW, the current LexicalUUIDType is using java.util.UUID.compareTo() which, looking at the JDK source and doing some tests, appears not to be a lexical comparison as described in rfc4122 since it's doing a signed comparison.
        Hide
        Jonathan Ellis added a comment -

        I'm not sure how useful the natural order of the other UUID versions is

        Agreed.

        don't access ByteBuffer.array() directly without asking if ByteBuffer.hasArray() first

        Yes, we have a bunch of methods in ByteBufferUtil that will use array if it exists otherwise will either duplicate and use relative methods or use byte-at-a-time get.

        write a unit test.

        +1, mostly it will just need to adapt what we have in TimeUUIDTypeTeset

        Show
        Jonathan Ellis added a comment - I'm not sure how useful the natural order of the other UUID versions is Agreed. don't access ByteBuffer.array() directly without asking if ByteBuffer.hasArray() first Yes, we have a bunch of methods in ByteBufferUtil that will use array if it exists otherwise will either duplicate and use relative methods or use byte-at-a-time get. write a unit test. +1, mostly it will just need to adapt what we have in TimeUUIDTypeTeset
        Hide
        Ed Anuff added a comment - - edited

        * generalize the type, make it deal with all kinds of variants/versions. (http://tools.ietf.org/html/rfc4122)

        I'm not sure how useful the natural order of the other UUID versions is. Minor point of correction, as per Java bug 7025832, the LexicalUUIDType inherits java.util.UUID.compareTo()'s signed comparison flaw, since the LexicalUUIDType doesn't do byte comparison, it's converting the bytes into java.util.UUIDs and using the UUID class' compareTo() method. Doing a byte comparison would match the rfc, which is reflected in the most recent version of patch proposed.

        * check if it makes sense to treat the Nil UUID differently (similarly to NULL in popular SQL databases).

        Yes, it does make sense. Nil UUID should always compare as less than.

        Show
        Ed Anuff added a comment - - edited * generalize the type, make it deal with all kinds of variants/versions. ( http://tools.ietf.org/html/rfc4122 ) I'm not sure how useful the natural order of the other UUID versions is. Minor point of correction, as per Java bug 7025832 , the LexicalUUIDType inherits java.util.UUID.compareTo()'s signed comparison flaw, since the LexicalUUIDType doesn't do byte comparison, it's converting the bytes into java.util.UUIDs and using the UUID class' compareTo() method. Doing a byte comparison would match the rfc, which is reflected in the most recent version of patch proposed. * check if it makes sense to treat the Nil UUID differently (similarly to NULL in popular SQL databases). Yes, it does make sense. Nil UUID should always compare as less than.
        Hide
        Folke Behrens added a comment -

        A few more suggestions:

        • generalize the type, make it deal with all kinds of variants/versions. (http://tools.ietf.org/html/rfc4122)
        • check if it makes sense to treat the Nil UUID differently (similarly to NULL in popular SQL databases).
        • don't access ByteBuffer.array() directly without asking if ByteBuffer.hasArray() first.
        • write a unit test.
        • format the code: http://wiki.apache.org/cassandra/CodeStyle
        • check with the devs if they would prefer a patch.
        Show
        Folke Behrens added a comment - A few more suggestions: generalize the type, make it deal with all kinds of variants/versions. ( http://tools.ietf.org/html/rfc4122 ) check if it makes sense to treat the Nil UUID differently (similarly to NULL in popular SQL databases). don't access ByteBuffer.array() directly without asking if ByteBuffer.hasArray() first. write a unit test. format the code: http://wiki.apache.org/cassandra/CodeStyle check with the devs if they would prefer a patch.
        Hide
        Ed Anuff added a comment - - edited

        Yes, I would think so.

        Caveat: deprecate == not recommend for new usage, existing column families using the LexicalUUID have to stay with it, for the reason Frank points out.

        Show
        Ed Anuff added a comment - - edited Yes, I would think so. Caveat: deprecate == not recommend for new usage, existing column families using the LexicalUUID have to stay with it, for the reason Frank points out.
        Hide
        Jonathan Ellis added a comment -

        That makes sense. So if we commit this it would make sense to deprecate LexicalUUID right?

        Show
        Jonathan Ellis added a comment - That makes sense. So if we commit this it would make sense to deprecate LexicalUUID right?
        Hide
        Ed Anuff added a comment -

        This wasn't necessarily suggested as a replacement for the two existing comparators but as an alternative for:

        (1) convenience

        (2) being able to switch your UUID generation technique later and while the ordering might not be useful, it would still be predictable. What happens now if you start with time-based UUIDs and switch to lexical?

        (3) use foreign generated UUID's with a preference for time-based sorting if possible

        Show
        Ed Anuff added a comment - This wasn't necessarily suggested as a replacement for the two existing comparators but as an alternative for: (1) convenience (2) being able to switch your UUID generation technique later and while the ordering might not be useful, it would still be predictable. What happens now if you start with time-based UUIDs and switch to lexical? (3) use foreign generated UUID's with a preference for time-based sorting if possible
        Hide
        Jonathan Ellis added a comment -

        I guess the idea is to drop both TimeUUIDType and Lexical for this?

        Part of me thinks that it's good to keep Time around so people who say they want chronological sorting gets that validated. And if you keep time around there's not much point in replacing Lexical UUID w/ this.

        Open to being convinced otherwise!

        Show
        Jonathan Ellis added a comment - I guess the idea is to drop both TimeUUIDType and Lexical for this? Part of me thinks that it's good to keep Time around so people who say they want chronological sorting gets that validated. And if you keep time around there's not much point in replacing Lexical UUID w/ this. Open to being convinced otherwise!
        Hide
        Ed Anuff added a comment -

        Removed incorrect copyright notice

        Show
        Ed Anuff added a comment - Removed incorrect copyright notice
        Hide
        Ed Anuff added a comment -

        Revised to fix comparison between time-based and non-time-based UUIDs. Time-based UUIDs now always compare as less than non-time-based UUIDs.

        Show
        Ed Anuff added a comment - Revised to fix comparison between time-based and non-time-based UUIDs. Time-based UUIDs now always compare as less than non-time-based UUIDs.
        Hide
        Ed Anuff added a comment -

        You are correct, I'll fix that.

        Show
        Ed Anuff added a comment - You are correct, I'll fix that.
        Hide
        Folke Behrens added a comment -

        This won't work because it violates the natural order. Imagine two time-based UUIDs t0 and t1 such that t0 < t1. There is a non-time-based UUID x such that x < t0 and x > t1. To remedy this you must not compare both types, you have to sort all time-based UUID either before or after all other UUIDs.

        Show
        Folke Behrens added a comment - This won't work because it violates the natural order. Imagine two time-based UUIDs t0 and t1 such that t0 < t1 . There is a non-time-based UUID x such that x < t0 and x > t1 . To remedy this you must not compare both types, you have to sort all time-based UUID either before or after all other UUIDs.

          People

          • Assignee:
            Ed Anuff
            Reporter:
            Ed Anuff
            Reviewer:
            Jonathan Ellis
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development