Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 10.6.1.0
    • Component/s: SQL
    • Labels:
      None

      Description

      By default MySQL is case insensitive in its string comparisons, as you can see from the MySQL docs shown below. Similar functionality is available in Sybase iAnywhere and in SQLServer. I'd like the same to be true for Derby.

      What, I wonder, are chances of that?

      I am aware that functions could be used to force comparisons in upper case but that subverts the indexes and makes searches unacceptably long.

      If you were to ask people you might find that this is a feature whose abscence is causing many to look elsewhere.

      thanks for all the great work,

      Terry

      The MySQL Docs say:

      -------- start quote

      By default, MySQL searches are not case sensitive (although there are some character sets that are never case insensitive, such as czech). This means that if you search with col_name LIKE 'a%', you get all column values that start with A or a. If you want to make this search case sensitive, make sure that one of the operands has a case sensitive or binary collation. For example, if you are comparing a column and a string that both have the latin1 character set, you can use the COLLATE operator to cause either operand to have the latin1_general_cs or latin1_bin collation. For example:

      col_name COLLATE latin1_general_cs LIKE 'a%'
      col_name LIKE 'a%' COLLATE latin1_general_cs
      col_name COLLATE latin1_bin LIKE 'a%'
      col_name LIKE 'a%' COLLATE latin1_bin

      If you want a column always to be treated in case-sensitive fashion, declare it with a case sensitive or binary collation. See Section 13.1.5, "CREATE TABLE Syntax".

      By default, the search is performed in case-insensitive fashion. In MySQL 4.1 and up, you can make a full-text search by using a binary collation for the indexed columns. For example, a column that has a character set of latin1 can be assigned a collation of latin1_bin to make it case sensitive for full-text searches.

      --------------- end quote

      1. remove_dead_code.diff
        3 kB
        Knut Anders Hatlen
      2. refman.txt
        1 kB
        Gunnar Grim
      3. devguide.txt
        0.9 kB
        Gunnar Grim
      4. collation-strength-2.diff
        22 kB
        Gunnar Grim
      5. collation-strength-1.diff
        14 kB
        Gunnar Grim
      6. collation-strength.diff
        12 kB
        Gunnar Grim

        Issue Links

          Activity

          Hide
          Daniel John Debrunner added a comment -

          One potentially easy way to provide this functionality would be to expand the values supported by the collation JDBC attribute.

          Today it supports UCS_BASIC (fixed collation based upon Unicode codepoint) or TERRITORY_BASED (collation using the locale (territory) of the database and collation provided by a RuleBasedCollator object obtained using the locale. The collation is set & fixed at create database time.

          The TERRITORY_BASED could be expanded to allow setting the strength of the RuleBasedCollator being used, e.g.

          TERRITORY_BASED - default strength

          TERRITORY_BASED:PRIMARY - strength set using collator.setStrength(Collator.PRIMARY)
          TERRITORY_BASED:SECONDARY - strength set using collator.setStrength(Collator.SECONDARY)
          TERRITORY_BASED:TERTIARY - strength set using collator.setStrength(Collator.TERTIARY )
          TERRITORY_BASED:IDENTICAL - strength set using collator.setStrength(Collator.IDENTICAL)

          This allows some user control of the collation and for some (most) languages will provide case-insensitive searches.

          E.g. according to the javadoc for java.text.Collator with a locale of en_US then TERRITORY_BASED:PRIMARY will give case insensitive searches.

          Apart from testing I think this would be around 20-30 lines of code:
          a) code to parse the new collation attribute values
          b) new constants for collation with strength (four new constants)
          c) code to set the strength based upon those constants.

          Show
          Daniel John Debrunner added a comment - One potentially easy way to provide this functionality would be to expand the values supported by the collation JDBC attribute. Today it supports UCS_BASIC (fixed collation based upon Unicode codepoint) or TERRITORY_BASED (collation using the locale (territory) of the database and collation provided by a RuleBasedCollator object obtained using the locale. The collation is set & fixed at create database time. The TERRITORY_BASED could be expanded to allow setting the strength of the RuleBasedCollator being used, e.g. TERRITORY_BASED - default strength TERRITORY_BASED:PRIMARY - strength set using collator.setStrength(Collator.PRIMARY) TERRITORY_BASED:SECONDARY - strength set using collator.setStrength(Collator.SECONDARY) TERRITORY_BASED:TERTIARY - strength set using collator.setStrength(Collator.TERTIARY ) TERRITORY_BASED:IDENTICAL - strength set using collator.setStrength(Collator.IDENTICAL) This allows some user control of the collation and for some (most) languages will provide case-insensitive searches. E.g. according to the javadoc for java.text.Collator with a locale of en_US then TERRITORY_BASED:PRIMARY will give case insensitive searches. Apart from testing I think this would be around 20-30 lines of code: a) code to parse the new collation attribute values b) new constants for collation with strength (four new constants) c) code to set the strength based upon those constants.
          Hide
          David Clements added a comment -

          There are very few practical database applications where users want case sensitive searching.
          Most database systems actually make case insensitive system the default, and we haven't come across any other DBMS that doesn't cater for it in a system level or database level setting (i.e. without having to apply case conversion functions).

          We believe the case sensitive searching issue to be a serious impediment to the take-up of Derby, and one that should be addressed as soon as possible.

          Daniel Debrunners suggested solution of extending collation to provide case-insensitive searching at database level gets our vote as the most practical way of fixing the problem.

          Show
          David Clements added a comment - There are very few practical database applications where users want case sensitive searching. Most database systems actually make case insensitive system the default, and we haven't come across any other DBMS that doesn't cater for it in a system level or database level setting (i.e. without having to apply case conversion functions). We believe the case sensitive searching issue to be a serious impediment to the take-up of Derby, and one that should be addressed as soon as possible. Daniel Debrunners suggested solution of extending collation to provide case-insensitive searching at database level gets our vote as the most practical way of fixing the problem.
          Hide
          geoff hendrey added a comment -

          I'd also agree that case-insensitivity is an absolute must, as least for LIKE comparisons. I'd be very eager to know the plans to get this implemented.

          Show
          geoff hendrey added a comment - I'd also agree that case-insensitivity is an absolute must, as least for LIKE comparisons. I'd be very eager to know the plans to get this implemented.
          Hide
          geoff hendrey added a comment -

          It was mentioned on the derby-user mail list that "auto generated columns" might be a way to deal with case-insensitive searching. It would be good to bring that discussion onto this JIRA issue, so that it may be considered in the context of other proposed solutions.

          I re-emphasize that the ability to do case-insensitive LIKE comparisons is absolutely critical. Think about virtually any application, such as a blog application. It's expected that one does not have to enter somebody's username or blog posting with exact-matching case in order to retrieve it. I have tries solutions such as using UPPER in the query. These work only for trivially small tables. When the number of rows grows, I saw query times taking many seconds (15 seconds! --it was a while back, but that is my recollection), when I used UPPER to perform comparisons against a regular index of a string column.

          Perhaps there is a hybrid solution, that uses, for example, UPPER, in conjunction with an index. What if we could force an index to store an uppercase version of the column? Then as long as we used UPPER in our query, the search would be perfectly efficient, against the uppercase index. Or more generally, what if could intercept any column value, before placing it into the index, or updating the index, and apply a scalar (non-aggregate) built-in function to the column value?

          This is the existing syntax for creating an index:

          CREATE [UNIQUE] INDEX index-Name
          ON table-Name ( Simple-column-Name [ ASC | DESC ]
          [ , Simple-column-Name [ ASC | DESC ]] * )

          Here is a proposed modification, backwards compatible, to allow scalar (non-aggregate) functions to be applied to columns in the index:

          ///--begin BNF-like syntax --///

          CREATE [UNIQUE] INDEX index-Name ON table-Name ( Intercepted-column [,Intercepted-column]* )

          Intercepted-column:

          ((Built-in-function '(' Simple-column-Name ')' |Simple-column-Name) [ASC|DESC]) [ , (Built-in-function '(' Simple-column-Name ')' |Simple-column-Name) [ ASC | DESC ]] *

          ///---end BNF-like syntax --///

          This allows us to do things like

          "CREATE INDEX NAME_UPPERCASE ON MYTABLE(UPPER(NAME) ASC)"

          or "CREATE INDEX SPEED ON MYTABLE(ABS(VELOCITY))"

          basically we can apply scalar functions to the columns before they are indexed, and when the index is updated.

          Show
          geoff hendrey added a comment - It was mentioned on the derby-user mail list that "auto generated columns" might be a way to deal with case-insensitive searching. It would be good to bring that discussion onto this JIRA issue, so that it may be considered in the context of other proposed solutions. I re-emphasize that the ability to do case-insensitive LIKE comparisons is absolutely critical. Think about virtually any application, such as a blog application. It's expected that one does not have to enter somebody's username or blog posting with exact-matching case in order to retrieve it. I have tries solutions such as using UPPER in the query. These work only for trivially small tables. When the number of rows grows, I saw query times taking many seconds (15 seconds! --it was a while back, but that is my recollection), when I used UPPER to perform comparisons against a regular index of a string column. Perhaps there is a hybrid solution, that uses, for example, UPPER, in conjunction with an index. What if we could force an index to store an uppercase version of the column? Then as long as we used UPPER in our query, the search would be perfectly efficient, against the uppercase index. Or more generally, what if could intercept any column value, before placing it into the index, or updating the index, and apply a scalar (non-aggregate) built-in function to the column value? This is the existing syntax for creating an index: CREATE [UNIQUE] INDEX index-Name ON table-Name ( Simple-column-Name [ ASC | DESC ] [ , Simple-column-Name [ ASC | DESC ]] * ) Here is a proposed modification, backwards compatible, to allow scalar (non-aggregate) functions to be applied to columns in the index: ///--begin BNF-like syntax --/// CREATE [UNIQUE] INDEX index-Name ON table-Name ( Intercepted-column [,Intercepted-column] * ) Intercepted-column: ((Built-in-function '(' Simple-column-Name ')' |Simple-column-Name) [ASC|DESC] ) [ , (Built-in-function '(' Simple-column-Name ')' |Simple-column-Name) [ ASC | DESC ]] * ///---end BNF-like syntax --/// This allows us to do things like "CREATE INDEX NAME_UPPERCASE ON MYTABLE(UPPER(NAME) ASC)" or "CREATE INDEX SPEED ON MYTABLE(ABS(VELOCITY))" basically we can apply scalar functions to the columns before they are indexed, and when the index is updated.
          Hide
          Rick Hillegas added a comment -

          Hi Geoff,

          Thanks for continuing to look at this issue. You are, of course, right that generated columns only help you if you put an index on the generated column. Your current proposal looks to me like DERBY-455 (expression indexes). I am linking all of these issues together to help focus the discussion: DERBY-455 (expression indexes), DERBY-481 (generated columns), DERBY-1748 (case-insensitive search).

          I agree that DERBY-455 is an elegant, powerful feature. However, I think that it involves almost all of the work needed for DERBY-481. In addition, DERBY-455 requires that we teach the optimizer how to recognize query fragments which can be matched to these new indexes. DERBY-481 used to have a serious defect, which was that you couldn't drop these generated columns if you made a mistake--but that objection went away with release 10.3.

          So right now, DERBY-481 looks to me like the cheapest way to get case-insensitive searching--provided that the user indexes the generated column.

          Show
          Rick Hillegas added a comment - Hi Geoff, Thanks for continuing to look at this issue. You are, of course, right that generated columns only help you if you put an index on the generated column. Your current proposal looks to me like DERBY-455 (expression indexes). I am linking all of these issues together to help focus the discussion: DERBY-455 (expression indexes), DERBY-481 (generated columns), DERBY-1748 (case-insensitive search). I agree that DERBY-455 is an elegant, powerful feature. However, I think that it involves almost all of the work needed for DERBY-481 . In addition, DERBY-455 requires that we teach the optimizer how to recognize query fragments which can be matched to these new indexes. DERBY-481 used to have a serious defect, which was that you couldn't drop these generated columns if you made a mistake--but that objection went away with release 10.3. So right now, DERBY-481 looks to me like the cheapest way to get case-insensitive searching--provided that the user indexes the generated column.
          Hide
          geoff hendrey added a comment -

          Hi Rick,

          Sorry I did not review 455, because it appears my proposal is identical to 455. Issue 455 correctly notes that the function being indexed must be even more constrained than scalar; it must be deterministic. I had only considered the fact that it must be scalar (cannot be an aggregate).

          I would be fine with the proposal for generated columns, 481, assuming my read of its description is correct. That is:
          1) updates and inserts propagate seamlessly to the generated columns
          2) dropping the "owner" column can seamlessly cascade to dropping the generated column
          3) any indexes created on the generated column seamlessly disappear when the generated column is dropped

          Is my understanding correct?

          Show
          geoff hendrey added a comment - Hi Rick, Sorry I did not review 455, because it appears my proposal is identical to 455. Issue 455 correctly notes that the function being indexed must be even more constrained than scalar; it must be deterministic. I had only considered the fact that it must be scalar (cannot be an aggregate). I would be fine with the proposal for generated columns, 481, assuming my read of its description is correct. That is: 1) updates and inserts propagate seamlessly to the generated columns 2) dropping the "owner" column can seamlessly cascade to dropping the generated column 3) any indexes created on the generated column seamlessly disappear when the generated column is dropped Is my understanding correct?
          Hide
          Rick Hillegas added a comment -

          Hi Geoff,

          For the gory details, you may want to read the functional spec attached to DERBY-481. Here are some responses to your questions:

          >1) updates and inserts propagate seamlessly to the generated columns

          Yes. Derby will recalculate the values of generated columns if you change (insert/update) the values of columns referenced by the generation clause.

          >2) dropping the "owner" column can seamlessly cascade to dropping the generated column

          The default behavior of DROP COLUMN will continue to be CASCADE, which will produce the results you want. This detail is tucked away in the functional spec in the "Behavior" section.

          >3) any indexes created on the generated column seamlessly disappear when the generated column is dropped

          Yes, this is how it will work.

          Hope this helps,
          -Rick

          Show
          Rick Hillegas added a comment - Hi Geoff, For the gory details, you may want to read the functional spec attached to DERBY-481 . Here are some responses to your questions: >1) updates and inserts propagate seamlessly to the generated columns Yes. Derby will recalculate the values of generated columns if you change (insert/update) the values of columns referenced by the generation clause. >2) dropping the "owner" column can seamlessly cascade to dropping the generated column The default behavior of DROP COLUMN will continue to be CASCADE, which will produce the results you want. This detail is tucked away in the functional spec in the "Behavior" section. >3) any indexes created on the generated column seamlessly disappear when the generated column is dropped Yes, this is how it will work. Hope this helps, -Rick
          Hide
          Tom Fonteyne added a comment -

          I'm porting a fairly large application to use Derby in embedded mode.
          But lacking case-insensitive searches is a showstopper.

          Right now I will have to defect to MySQL or perhaps to H2 (although I'm not sure how mature the product is) - but I really would prefer to use Derby.

          Kind regards,
          Tom

          Show
          Tom Fonteyne added a comment - I'm porting a fairly large application to use Derby in embedded mode. But lacking case-insensitive searches is a showstopper. Right now I will have to defect to MySQL or perhaps to H2 (although I'm not sure how mature the product is) - but I really would prefer to use Derby. Kind regards, Tom
          Hide
          Pieter-Jan Savat added a comment -

          Having no way to do a case-insensitive search on a table is a blocking issue for us.
          With DERBY-481there may be a quick solution. But I'm wondering, if a table contains 20+ columns that can be used in a search,
          whether copying these columns doesn't affect performance and bloat the database.

          Anyway any kind of solution (not involving manually added triggers) anytime soon would be great...

          Best regards,
          Pieter-Jan

          Show
          Pieter-Jan Savat added a comment - Having no way to do a case-insensitive search on a table is a blocking issue for us. With DERBY-481 there may be a quick solution. But I'm wondering, if a table contains 20+ columns that can be used in a search, whether copying these columns doesn't affect performance and bloat the database. Anyway any kind of solution (not involving manually added triggers) anytime soon would be great... Best regards, Pieter-Jan
          Hide
          Knut Anders Hatlen added a comment -

          What Dan suggests in his comment (02/Nov/07) could be achieved with no changes in Derby if you're using Java SE 6. Define your own java.text.spi.CollatorProvider which returns a Collator on which you have called setStrength() and tell Derby to use that collator (territory=<your-custom-locale>;collation=TERRITORY_BASED;create=true). I just posted an example here: http://blogs.sun.com/kah/entry/user_defined_collation_in_apache

          Show
          Knut Anders Hatlen added a comment - What Dan suggests in his comment (02/Nov/07) could be achieved with no changes in Derby if you're using Java SE 6. Define your own java.text.spi.CollatorProvider which returns a Collator on which you have called setStrength() and tell Derby to use that collator (territory=<your-custom-locale>;collation=TERRITORY_BASED;create=true). I just posted an example here: http://blogs.sun.com/kah/entry/user_defined_collation_in_apache
          Hide
          geoff hendrey added a comment -

          Two questions:
          1) does this mean that '=' and 'LIKE' comparisons can be case insensitive?
          2) is it possible to switch between case-sensitive and case-insensitive comparisons on the fly?

          -geoff
          “The Americans would be less dangerous if they had a regular army.”
          – British General Frederick Haldimand, Boston, 1776

          Show
          geoff hendrey added a comment - Two questions: 1) does this mean that '=' and 'LIKE' comparisons can be case insensitive? 2) is it possible to switch between case-sensitive and case-insensitive comparisons on the fly? -geoff “The Americans would be less dangerous if they had a regular army.” – British General Frederick Haldimand, Boston, 1776
          Hide
          Rick Hillegas added a comment -

          Geoff hendrey wrote:
          > Two questions:
          > 1) does this mean that '=' and 'LIKE' comparisons can be case insensitive?
          Yes. Bear in mind, however, that the LIKE optimizations do not work on territory-based collations today. See http://issues.apache.org/jira/browse/DERBY-3854
          > 2) is it possible to switch between case-sensitive and case-insensitive comparisons on the fly?
          Not with indexed support, that I'm aware of. You can always wrap your column references with normalizing functions like UPPER and get the semantics (but not necessarily the performance) that you want. For more speculation about how to handle these issues, please see http://www.nabble.com/ORDER-BY-and-greek-characters-td20748193.html#a20748193

          Hope this helps,
          -Rick

          Show
          Rick Hillegas added a comment - Geoff hendrey wrote: > Two questions: > 1) does this mean that '=' and 'LIKE' comparisons can be case insensitive? Yes. Bear in mind, however, that the LIKE optimizations do not work on territory-based collations today. See http://issues.apache.org/jira/browse/DERBY-3854 > 2) is it possible to switch between case-sensitive and case-insensitive comparisons on the fly? Not with indexed support, that I'm aware of. You can always wrap your column references with normalizing functions like UPPER and get the semantics (but not necessarily the performance) that you want. For more speculation about how to handle these issues, please see http://www.nabble.com/ORDER-BY-and-greek-characters-td20748193.html#a20748193 Hope this helps, -Rick
          Hide
          Gunnar Grim added a comment -

          I have now implemented Dans suggestions and it seems to work fine. LIKE comparisions, equality comparisions and unique indices are case insensitive with strength SECONDARY.

          In addition to the below changes I suppose that the database version number needs to be changed since a database with a collation type like TERRITORY_BASED:SECONDARY won't work with an older Derby version.

          The changed classes are:

          org.apache.derby.impl.sql.compile.CreateTableNode
          Collation type translation to name changed.

          org.apache.derby.impl.sql.catalog.DataDictionaryImpl
          Initialization of collation type changed when booting database.

          org.apache.derby.iapi.reference.Property
          Four new string constants for TERRITORY_BASED:PRIMARY etc.

          org.apache.derby.iapi.types.DataTypeDescriptor
          New static method getCollationType(String) for translating property value text to int.
          New static method getCollationName(int) for translating int to property value text.
          Method getCollationName() uses above method.

          org.apache.derby.iapi.types.DataValueFactoryImpl
          Initialization of collation type changed when creating database.
          Strength set on collator when a strength variant of TERRITORY_BASED is used.

          org.apache.derby.iapi.types.StringDataValue
          Four new int constants for COLLATION_TYPE_TERRITORY_BASED_PRIMARY etc.

          Show
          Gunnar Grim added a comment - I have now implemented Dans suggestions and it seems to work fine. LIKE comparisions, equality comparisions and unique indices are case insensitive with strength SECONDARY. In addition to the below changes I suppose that the database version number needs to be changed since a database with a collation type like TERRITORY_BASED:SECONDARY won't work with an older Derby version. The changed classes are: org.apache.derby.impl.sql.compile.CreateTableNode Collation type translation to name changed. org.apache.derby.impl.sql.catalog.DataDictionaryImpl Initialization of collation type changed when booting database. org.apache.derby.iapi.reference.Property Four new string constants for TERRITORY_BASED:PRIMARY etc. org.apache.derby.iapi.types.DataTypeDescriptor New static method getCollationType(String) for translating property value text to int. New static method getCollationName(int) for translating int to property value text. Method getCollationName() uses above method. org.apache.derby.iapi.types.DataValueFactoryImpl Initialization of collation type changed when creating database. Strength set on collator when a strength variant of TERRITORY_BASED is used. org.apache.derby.iapi.types.StringDataValue Four new int constants for COLLATION_TYPE_TERRITORY_BASED_PRIMARY etc.
          Hide
          geoff hendrey added a comment -

          Could you provide a synopsis of the recommended way to do case insensitive string comparison, taking into account this new capability you added?

          -geoff


          http://nextdb.net - RESTful Relational Database
          http://www.nextdb.net/wiki/en/REST

          — On Tue, 3/9/10, Gunnar Grim (JIRA) <jira@apache.org> wrote:

          From: Gunnar Grim (JIRA) <jira@apache.org>
          Subject: [jira] Updated: (DERBY-1748) Global case insensitive setting
          To: geoff_hendrey@yahoo.com
          Date: Tuesday, March 9, 2010, 12:15 AM

               [ https://issues.apache.org/jira/browse/DERBY-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

          Gunnar Grim updated DERBY-1748:
          -------------------------------

              Attachment: collation-strength.diff

          I have now implemented Dans suggestions and it seems to work fine. LIKE comparisions, equality comparisions and unique indices are case insensitive with strength SECONDARY.

          In addition to the below changes I suppose that the database version number needs to be changed since a database with a collation type like TERRITORY_BASED:SECONDARY won't work with an older Derby version.

          The changed classes are:

          org.apache.derby.impl.sql.compile.CreateTableNode
            Collation type translation to name changed.

          org.apache.derby.impl.sql.catalog.DataDictionaryImpl
            Initialization of collation type changed when booting database.

          org.apache.derby.iapi.reference.Property
            Four new string constants for TERRITORY_BASED:PRIMARY etc.

          org.apache.derby.iapi.types.DataTypeDescriptor
            New static method getCollationType(String) for translating property value text to int.
            New static method getCollationName(int) for translating int to property value text.
            Method getCollationName() uses above method.

          org.apache.derby.iapi.types.DataValueFactoryImpl
            Initialization of collation type changed when creating database.
            Strength set on collator when a strength variant of TERRITORY_BASED is used.

          org.apache.derby.iapi.types.StringDataValue
            Four new int constants for COLLATION_TYPE_TERRITORY_BASED_PRIMARY etc.


          This message is automatically generated by JIRA.
          -
          You can reply to this email to add a comment to the issue online.

          Show
          geoff hendrey added a comment - Could you provide a synopsis of the recommended way to do case insensitive string comparison, taking into account this new capability you added? -geoff – http://nextdb.net - RESTful Relational Database http://www.nextdb.net/wiki/en/REST — On Tue, 3/9/10, Gunnar Grim (JIRA) <jira@apache.org> wrote: From: Gunnar Grim (JIRA) <jira@apache.org> Subject: [jira] Updated: ( DERBY-1748 ) Global case insensitive setting To: geoff_hendrey@yahoo.com Date: Tuesday, March 9, 2010, 12:15 AM      [ https://issues.apache.org/jira/browse/DERBY-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunnar Grim updated DERBY-1748 : -------------------------------     Attachment: collation-strength.diff I have now implemented Dans suggestions and it seems to work fine. LIKE comparisions, equality comparisions and unique indices are case insensitive with strength SECONDARY. In addition to the below changes I suppose that the database version number needs to be changed since a database with a collation type like TERRITORY_BASED:SECONDARY won't work with an older Derby version. The changed classes are: org.apache.derby.impl.sql.compile.CreateTableNode   Collation type translation to name changed. org.apache.derby.impl.sql.catalog.DataDictionaryImpl   Initialization of collation type changed when booting database. org.apache.derby.iapi.reference.Property   Four new string constants for TERRITORY_BASED:PRIMARY etc. org.apache.derby.iapi.types.DataTypeDescriptor   New static method getCollationType(String) for translating property value text to int.   New static method getCollationName(int) for translating int to property value text.   Method getCollationName() uses above method. org.apache.derby.iapi.types.DataValueFactoryImpl   Initialization of collation type changed when creating database.   Strength set on collator when a strength variant of TERRITORY_BASED is used. org.apache.derby.iapi.types.StringDataValue   Four new int constants for COLLATION_TYPE_TERRITORY_BASED_PRIMARY etc. – This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
          Hide
          Gunnar Grim added a comment -

          Not sure what you mean Geoff, so I'll assume you are thinking of string comparisons 
          in SQL.

          All string comparisons I've tested become case insensitive with a database 
          that uses TERRITORY_BASED:SECONDARY collation. I've tested the following

          name LIKE 'a%'
           matches both "Adam" and "adam"

          name = 'adam'
           matches both "Adam" and "adam"

          name BETWEEN 'a' AND 'c'
           matches both "Baker" and "baker"

          name >= 'a' AND name <= 'c'
           matches both "Baker" and "baker"

          Using a varchar column as a primary key will consider 'Adam' and 'adam' as duplicates.

          JOIN's compare case insensitively.

          -Gunnar

          Show
          Gunnar Grim added a comment - Not sure what you mean Geoff, so I'll assume you are thinking of string comparisons  in SQL. All string comparisons I've tested become case insensitive with a database  that uses TERRITORY_BASED:SECONDARY collation. I've tested the following name LIKE 'a%'  matches both "Adam" and "adam" name = 'adam'  matches both "Adam" and "adam" name BETWEEN 'a' AND 'c'  matches both "Baker" and "baker" name >= 'a' AND name <= 'c'  matches both "Baker" and "baker" Using a varchar column as a primary key will consider 'Adam' and 'adam' as duplicates. JOIN's compare case insensitively. -Gunnar
          Hide
          geoff hendrey added a comment -

          Hi,

          I was just thinking it would be nice, now that you've shown this works, to have a pointer to a derby doc section on ""how to make your database case insensitive". If the details are hidden in a section on character collation, I don't think the average user of Derby will ever stumble upon it. Basically, out of laziness, I thought it would be good for you to send an email to the derby user group explaining how to make your database case insensitive. It's been a much desired feature.

          -geoff


          http://nextdb.net - RESTful Relational Database
          http://www.nextdb.net/wiki/en/REST

          — On Wed, 3/10/10, Gunnar Grim (JIRA) <jira@apache.org> wrote:

          From: Gunnar Grim (JIRA) <jira@apache.org>
          Subject: [jira] Commented: (DERBY-1748) Global case insensitive setting
          To: geoff_hendrey@yahoo.com
          Date: Wednesday, March 10, 2010, 12:13 AM

              [ https://issues.apache.org/jira/browse/DERBY-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12843473#action_12843473 ]

          Gunnar Grim commented on DERBY-1748:
          ------------------------------------

          Not sure what you mean Geoff, so I'll assume you are thinking of string comparisons 
          in SQL.

          All string comparisons I've tested become case insensitive with a database 
          that uses TERRITORY_BASED:SECONDARY collation. I've tested the following

          name LIKE 'a%'
           matches both "Adam" and "adam"

          name = 'adam'
           matches both "Adam" and "adam"

          name BETWEEN 'a' AND 'c'
           matches both "Baker" and "baker"

          name >= 'a' AND name <= 'c'
           matches both "Baker" and "baker"

          Using a varchar column as a primary key will consider 'Adam' and 'adam' as duplicates.

          JOIN's compare case insensitively.

          -Gunnar


          This message is automatically generated by JIRA.
          -
          You can reply to this email to add a comment to the issue online.

          Show
          geoff hendrey added a comment - Hi, I was just thinking it would be nice, now that you've shown this works, to have a pointer to a derby doc section on ""how to make your database case insensitive". If the details are hidden in a section on character collation, I don't think the average user of Derby will ever stumble upon it. Basically, out of laziness, I thought it would be good for you to send an email to the derby user group explaining how to make your database case insensitive. It's been a much desired feature. -geoff – http://nextdb.net - RESTful Relational Database http://www.nextdb.net/wiki/en/REST — On Wed, 3/10/10, Gunnar Grim (JIRA) <jira@apache.org> wrote: From: Gunnar Grim (JIRA) <jira@apache.org> Subject: [jira] Commented: ( DERBY-1748 ) Global case insensitive setting To: geoff_hendrey@yahoo.com Date: Wednesday, March 10, 2010, 12:13 AM     [ https://issues.apache.org/jira/browse/DERBY-1748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12843473#action_12843473 ] Gunnar Grim commented on DERBY-1748 : ------------------------------------ Not sure what you mean Geoff, so I'll assume you are thinking of string comparisons  in SQL. All string comparisons I've tested become case insensitive with a database  that uses TERRITORY_BASED:SECONDARY collation. I've tested the following name LIKE 'a%'  matches both "Adam" and "adam" name = 'adam'  matches both "Adam" and "adam" name BETWEEN 'a' AND 'c'  matches both "Baker" and "baker" name >= 'a' AND name <= 'c'  matches both "Baker" and "baker" Using a varchar column as a primary key will consider 'Adam' and 'adam' as duplicates. JOIN's compare case insensitively. -Gunnar – This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
          Hide
          Gunnar Grim added a comment -

          Yes, user documentation on case insensitivity will be needed but my patch has yet to be accepted by the Derby developers.

          Show
          Gunnar Grim added a comment - Yes, user documentation on case insensitivity will be needed but my patch has yet to be accepted by the Derby developers.
          Hide
          Knut Anders Hatlen added a comment -

          Thanks for providing the patch, Gunnar! This looks like a nice improvement. We'd also need some regression tests to verify that the functionality works before we can check it in. If you want to take a stab at it, you could take a look at the existing collation tests and see if they can be extended with tests for this new feature. They can be found under java/testing/org/apache/derbyTesting/functionTests/tests/lang and are called CollationTest and CollationTest2. The wiki contains more info about running and writing Derby regression tests: http://wiki.apache.org/db-derby/DerbyJUnitTesting

          You should probably also sign an ICLA (Individual Contributor License Agreement) and mail/fax it to Apache. See this wiki page: http://wiki.apache.org/db-derby/DerbyDev

          I have one question about the patch. There's a call to Collator.getInstance() in BasicDatabase.getCollator() which is not touched by the patch. Do we need to add a call to setStrength() there as well?

          Show
          Knut Anders Hatlen added a comment - Thanks for providing the patch, Gunnar! This looks like a nice improvement. We'd also need some regression tests to verify that the functionality works before we can check it in. If you want to take a stab at it, you could take a look at the existing collation tests and see if they can be extended with tests for this new feature. They can be found under java/testing/org/apache/derbyTesting/functionTests/tests/lang and are called CollationTest and CollationTest2. The wiki contains more info about running and writing Derby regression tests: http://wiki.apache.org/db-derby/DerbyJUnitTesting You should probably also sign an ICLA (Individual Contributor License Agreement) and mail/fax it to Apache. See this wiki page: http://wiki.apache.org/db-derby/DerbyDev I have one question about the patch. There's a call to Collator.getInstance() in BasicDatabase.getCollator() which is not touched by the patch. Do we need to add a call to setStrength() there as well?
          Hide
          Gunnar Grim added a comment -

          Yes, it seems more than reasonable to set the strength on this collator since it is based on the database locale. Patch updated.

          Show
          Gunnar Grim added a comment - Yes, it seems more than reasonable to set the strength on this collator since it is based on the database locale. Patch updated.
          Hide
          Knut Anders Hatlen added a comment -

          Actually, it looks like the getCollator() method in BasicDatabase is never used. It's probably just dead code that's left over from the early Cloudscape days. I'm attaching a patch that removes the code so that we don't need to worry about it. I'll check in that patch shortly if all the regression tests run cleanly with it.

          Show
          Knut Anders Hatlen added a comment - Actually, it looks like the getCollator() method in BasicDatabase is never used. It's probably just dead code that's left over from the early Cloudscape days. I'm attaching a patch that removes the code so that we don't need to worry about it. I'll check in that patch shortly if all the regression tests run cleanly with it.
          Hide
          Knut Anders Hatlen added a comment -

          Committed remove_dead_code.diff to trunk with revision 922682.

          Show
          Knut Anders Hatlen added a comment - Committed remove_dead_code.diff to trunk with revision 922682.
          Hide
          Gunnar Grim added a comment -

          Here is a new patch based on the latest commit. There is now no change in BasicDatabase and there is a test case in CollationTest.

          I have also faxed a signed ICLA to Apache. I'll take a look at CollationTest2 ASAP.

          Show
          Gunnar Grim added a comment - Here is a new patch based on the latest commit. There is now no change in BasicDatabase and there is a test case in CollationTest. I have also faxed a signed ICLA to Apache. I'll take a look at CollationTest2 ASAP.
          Hide
          Gunnar Grim added a comment -

          Suggested text to add to the section "Creating a database with territory-based collation" in the developers guide.

          Show
          Gunnar Grim added a comment - Suggested text to add to the section "Creating a database with territory-based collation" in the developers guide.
          Hide
          Gunnar Grim added a comment -

          Suggested text for the reference manual, section "collation=collation attribute".

          Show
          Gunnar Grim added a comment - Suggested text for the reference manual, section "collation=collation attribute".
          Hide
          Kim Haase added a comment -

          Thank you, Gunnar, that should give me enough to go on – the dev guide text explains what the strengths mean. I should note that in the devgulde.text one, in my locale, a character in the following came across as a diamond with a question mark in it –

          as well as "a" and "�".

          I'll have to see if there is some locale-neutral way of putting such characters in the manuals.

          Show
          Kim Haase added a comment - Thank you, Gunnar, that should give me enough to go on – the dev guide text explains what the strengths mean. I should note that in the devgulde.text one, in my locale, a character in the following came across as a diamond with a question mark in it – as well as "a" and "�". I'll have to see if there is some locale-neutral way of putting such characters in the manuals.
          Hide
          Gunnar Grim added a comment -

          Sorry about the strange character. The file is saved in ISO-8859-1. The character is "á", see http://www.fileformat.info/info/unicode/char/00e1/index.htm

          Show
          Gunnar Grim added a comment - Sorry about the strange character. The file is saved in ISO-8859-1. The character is "á", see http://www.fileformat.info/info/unicode/char/00e1/index.htm
          Hide
          Kim Haase added a comment -

          I'm working on documenting this new feature. Thanks for the information on it.

          One question: the spec for the dev guide says, "The exact interpretation of the strength attribute is locale dependant." How would a user find out the interpretation for a particular locale? Is there some reference material on the Web for this?

          Thanks.

          Show
          Kim Haase added a comment - I'm working on documenting this new feature. Thanks for the information on it. One question: the spec for the dev guide says, "The exact interpretation of the strength attribute is locale dependant." How would a user find out the interpretation for a particular locale? Is there some reference material on the Web for this? Thanks.
          Hide
          Gunnar Grim added a comment -

          The Wikipedia entry on Collation references this article: http://www.unicode.org/reports/tr10/

          For the normal user I believe it is sufficient to know that the strength values PRIMARY, SECONDARY and TERTIARY handle differences in base characters, accents and case, respectively. If you know your own language you will know what that means. For example, anyone who knows Swedish will know that the letters A and Ä are different even with PRIMARY strength. The dots are not an accent in Swedish although in another language they may well be.

          Therefore, I think that the text I suggested for the dev guide is detailed enough. Perhaps you could add a recommendation to use PRIMARY if you want Derby to behave like MySQL, MS SQL Server and probably most other DBMS'es do by default.

          Show
          Gunnar Grim added a comment - The Wikipedia entry on Collation references this article: http://www.unicode.org/reports/tr10/ For the normal user I believe it is sufficient to know that the strength values PRIMARY, SECONDARY and TERTIARY handle differences in base characters, accents and case, respectively. If you know your own language you will know what that means. For example, anyone who knows Swedish will know that the letters A and Ä are different even with PRIMARY strength. The dots are not an accent in Swedish although in another language they may well be. Therefore, I think that the text I suggested for the dev guide is detailed enough. Perhaps you could add a recommendation to use PRIMARY if you want Derby to behave like MySQL, MS SQL Server and probably most other DBMS'es do by default.
          Hide
          Knut Anders Hatlen added a comment -

          Gunnar's ICLA has been recorded here: http://people.apache.org/~jim/committers.html
          So I think we can start looking at getting the patch committed now. The patch looks good to me, so I plan to commit it if all the regression tests run cleanly with it.

          It would be good to have some more tests to verify that it works as expected with the different strengths (currently, we only test secondary strength), but we can add more tests later.

          Show
          Knut Anders Hatlen added a comment - Gunnar's ICLA has been recorded here: http://people.apache.org/~jim/committers.html So I think we can start looking at getting the patch committed now. The patch looks good to me, so I plan to commit it if all the regression tests run cleanly with it. It would be good to have some more tests to verify that it works as expected with the different strengths (currently, we only test secondary strength), but we can add more tests later.
          Hide
          Knut Anders Hatlen added a comment -

          Kim, I assume you meant to assign DERBY-4591 to yourself? Re-assigning this issue to Gunnar.

          Show
          Knut Anders Hatlen added a comment - Kim, I assume you meant to assign DERBY-4591 to yourself? Re-assigning this issue to Gunnar.
          Hide
          Knut Anders Hatlen added a comment -

          All the regression tests ran cleanly, so I committed the collation-strength-2.diff patch to trunk with revision 929111.
          Thanks for contributing the patch, Gunnar!

          Show
          Knut Anders Hatlen added a comment - All the regression tests ran cleanly, so I committed the collation-strength-2.diff patch to trunk with revision 929111. Thanks for contributing the patch, Gunnar!
          Hide
          Kim Haase added a comment -

          Yes, silly me, it was 4591 I meant to assign to myself. And thanks for committing the code, Knut – I will build Derby and try it out.

          Gunnar, thanks for the advice. I will take it except that perhaps I will not mention specific other databases – to make the doc easier to maintain. Nothing lasts forever ...

          Show
          Kim Haase added a comment - Yes, silly me, it was 4591 I meant to assign to myself. And thanks for committing the code, Knut – I will build Derby and try it out. Gunnar, thanks for the advice. I will take it except that perhaps I will not mention specific other databases – to make the doc easier to maintain. Nothing lasts forever ...
          Hide
          Kim Haase added a comment -

          Actually, I just realized that the patch I proposed doesn't include that last recommendation about using PRIMARY for compatibility with other databases. I will update the patch – sorry.

          Show
          Kim Haase added a comment - Actually, I just realized that the patch I proposed doesn't include that last recommendation about using PRIMARY for compatibility with other databases. I will update the patch – sorry.
          Hide
          Gunnar Grim added a comment -

          Knut, are you sure I should be assigned this issue? I am but a lowly contributor with very few rights and no idea what to do with the issue at this stage.

          Show
          Gunnar Grim added a comment - Knut, are you sure I should be assigned this issue? I am but a lowly contributor with very few rights and no idea what to do with the issue at this stage.
          Hide
          Knut Anders Hatlen added a comment -

          Gunnar, we normally use the "assigned to" field to indicate who has been the main contributor, and since you wrote the code, that's you.

          Show
          Knut Anders Hatlen added a comment - Gunnar, we normally use the "assigned to" field to indicate who has been the main contributor, and since you wrote the code, that's you.
          Hide
          Mamta A. Satoor added a comment -

          I saw following test failure on a Linux machine running with IBM jdk16 and thought it might be related to this jira entry.
          1) testSwedishCaseInsensitiveCollation(org.apache.derbyTesting.functionTests.tests.lang.CollationTest)junit.framework.AssertionFailedError: Column value mismatch @ column 'ID', row 1:
          Expected: >6<
          Found: >4<
          at org.apache.derbyTesting.junit.JDBC.assertRowInResultSet(JDBC.java(Compiled Code))
          at org.apache.derbyTesting.junit.JDBC.assertRowInResultSet(JDBC.java(Inlined Compiled Code))
          at org.apache.derbyTesting.junit.JDBC.assertFullResultSet(JDBC.java(Compiled Code))
          at org.apache.derbyTesting.junit.JDBC.assertFullResultSet(JDBC.java(Inlined Compiled Code))
          at org.apache.derbyTesting.junit.JDBC.assertFullResultSet(JDBC.java(Compiled Code))
          at org.apache.derbyTesting.functionTests.tests.lang.CollationTest.checkLangBasedQuery(CollationTest.java:1324)
          at org.apache.derbyTesting.functionTests.tests.lang.CollationTest.testSwedishCaseInsensitiveCollation(CollationTest.java:561)
          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java(Compiled Code))
          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java(Compiled Code))
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java(Compiled Code))
          at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:109)
          at junit.extensions.TestDecorator.basicRun(TestDecorator.java:22)
          at junit.extensions.TestSetup$1.protect(TestSetup.java:19)
          at junit.extensions.TestSetup.run(TestSetup.java:23)
          at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57)
          at junit.extensions.TestDecorator.basicRun(TestDecorator.java:22)
          at junit.extensions.TestSetup$1.protect(TestSetup.java:19)
          at junit.extensions.TestSetup.run(TestSetup.java:23)
          at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57)
          at junit.extensions.TestDecorator.basicRun(TestDecorator.java:22)
          at junit.extensions.TestSetup$1.protect(TestSetup.java:19)
          at junit.extensions.TestSetup.run(TestSetup.java:23)

          Show
          Mamta A. Satoor added a comment - I saw following test failure on a Linux machine running with IBM jdk16 and thought it might be related to this jira entry. 1) testSwedishCaseInsensitiveCollation(org.apache.derbyTesting.functionTests.tests.lang.CollationTest)junit.framework.AssertionFailedError: Column value mismatch @ column 'ID', row 1: Expected: >6< Found: >4< at org.apache.derbyTesting.junit.JDBC.assertRowInResultSet(JDBC.java(Compiled Code)) at org.apache.derbyTesting.junit.JDBC.assertRowInResultSet(JDBC.java(Inlined Compiled Code)) at org.apache.derbyTesting.junit.JDBC.assertFullResultSet(JDBC.java(Compiled Code)) at org.apache.derbyTesting.junit.JDBC.assertFullResultSet(JDBC.java(Inlined Compiled Code)) at org.apache.derbyTesting.junit.JDBC.assertFullResultSet(JDBC.java(Compiled Code)) at org.apache.derbyTesting.functionTests.tests.lang.CollationTest.checkLangBasedQuery(CollationTest.java:1324) at org.apache.derbyTesting.functionTests.tests.lang.CollationTest.testSwedishCaseInsensitiveCollation(CollationTest.java:561) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java(Compiled Code)) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java(Compiled Code)) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java(Compiled Code)) at org.apache.derbyTesting.junit.BaseTestCase.runBare(BaseTestCase.java:109) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:22) at junit.extensions.TestSetup$1.protect(TestSetup.java:19) at junit.extensions.TestSetup.run(TestSetup.java:23) at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:22) at junit.extensions.TestSetup$1.protect(TestSetup.java:19) at junit.extensions.TestSetup.run(TestSetup.java:23) at org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:22) at junit.extensions.TestSetup$1.protect(TestSetup.java:19) at junit.extensions.TestSetup.run(TestSetup.java:23)
          Hide
          Knut Anders Hatlen added a comment -

          Myrna has logged the regression test failure as DERBY-4604. It appears to be a JVM bug. See more details there.

          Show
          Knut Anders Hatlen added a comment - Myrna has logged the regression test failure as DERBY-4604 . It appears to be a JVM bug. See more details there.

            People

            • Assignee:
              Gunnar Grim
              Reporter:
              Terry
            • Votes:
              9 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development