|
Knut Anders Hatlen made changes - 05/Dec/08 04:48 PM
Just a guess, but I think the problem might be that o.a.d.iapi.types.CollatorSQLVarchar doesn't override SQLChar.hashCode().
The reason why the abba/baab duplicate was removed, is that SQLChar.hashCode() just adds up the char values ignoring the positions, so the two strings have the same hash code because they contain the exact same characters (though in different order). Not necessary for this issue, but it would probably be better to calculate with a formula similar to the one described here <URL:http://java.sun.com/javase/6/docs/api/java/lang/String.html#hashCode()>. Just adding the char values will give a poor distribution and the higher bits won't be used unless the string is very long.
Knut Anders Hatlen made changes - 09/Dec/08 02:13 PM
Fixing this is probably as simple as overloading hashCode() in the CollatorSQL* classes with this implementation (perhaps with some special handling of null):
public int hashCode() { return getCollationKey().hashCode(); } The CollatorSQL* classes use CollationKey.compareTo() to implement Comparable.compareTo() and Object.equals(), so using CollationKey.hashCode() should give the required consistency between compareTo(), equals() and hashCode(). What are the upgrade implications of changing the hash code implementations of a data type?
I don't think we store the hash values anywhere, so to my knowledge they just have to be consistent within the lifetime of the JVM. The only place I know that the hash values will go to disk, is when a BackingStoreHashtable spills to disk, but that's just in a temporary file that will be deleted on the next boot, as far as I know.
Knut Anders Hatlen made changes - 16/Dec/08 10:47 PM
The attached patch attempts to fix the problem by implementing a hashCode() method in CollatorSQLChar, CollatorSQLVarchar, CollatorSQLLongvarchar and CollatorSQLClob based on CollationKey.hashCode(). It also extends CollationTest.compareAgrave() with a test case for SELECT DISTINCT, and makes it test both CHAR and VARCHAR (previously it only tested VARCHAR). CollationTest fails without the fix and passes with the fix. The test is based on the fact that in the French locale, À (Unicode code point 00C0) is the same as À ('A' + Unicode code point 0300), whereas they are different in UCS_BASIC.
I will start the regression tests now.
Knut Anders Hatlen made changes - 16/Dec/08 10:54 PM
All the regression tests passed.
Knut Anders Hatlen made changes - 17/Dec/08 07:11 AM
Knut Anders Hatlen made changes - 17/Dec/08 12:55 PM
One thing I forgot to mention in the description of the patch:
The patch changes the hashCode() implementation in CollatorSQL{Char,Varchar,Longvarchar,Clob}, but it only tests CHAR and VARCHAR. That's because LONG VARCHAR and CLOB columns cannot be used in distinct queries without casting them to another data type first, and since you cannot compare two such columns with =, you cannot perform a hash join to test it either. I'm not aware of any other code than distinct scans and hash scans that will ever call the hashCode() methods of these objects, so I don't know of any way to test those two data types.
Committed revision 728822.
I'll keep the issue open until the fix has been back-ported to 10.4 and 10.3.
Knut Anders Hatlen made changes - 22/Dec/08 11:23 PM
Merged the fix to the 10.4 branch and committed revision 730163.
The patch didn't merge cleanly to 10.3 because of conflicts in CollationTest. If someone wants to do the manual back-port to 10.3, we can reopen the issue later, but I'm marking it as resolved and closing it for now.
Knut Anders Hatlen made changes - 30/Dec/08 04:24 PM
Reopening the issue to merge the changes into 10.3 codeline
Mamta A. Satoor made changes - 09/Jan/09 05:38 PM
Merged the fix to the 10.3 branch(had to do some manual tweaking to the CollationTest) and committed revision 733094.
Mamta A. Satoor made changes - 09/Jan/09 05:41 PM
Knut Anders Hatlen made changes - 25/Apr/09 05:21 PM
Myrna van Lunteren made changes - 04/May/09 06:22 PM
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
$ mkdir -p META-INF/services
$ javac AisBCollatorProvider.java
$ echo AisBCollatorProvider > META-INF/services/java.text.spi.CollatorProvider
$ jar cf aisb-collator.jar AisBCollatorProvider.class META-INF
$ java -Djava.ext.dirs=. -jar /path/to/derbyrun.jar ij