Cayenne
  1. Cayenne
  2. CAY-1210

mysql does not use index for case insensitive searches

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.1M3
    • Component/s: Database integration
    • Labels:
      None

      Description

      When performing a case insensitive search Cayenne spits out SQL which looks like this

      SELECT .... WHERE upper(name) LIKE upper("fred")

      This prevents any index being used for the search. Since mysql already performed case insensitive searches on text fields we need to suppress the 'upper' functions being used in these situations. All searches on these fields are already case insensitive.

      http://dev.mysql.com/doc/refman/5.0/en/case-sensitivity.html

      1. case-insensitive-search.patch
        14 kB
        Dzmitry Kazimirchyk
      2. case-insensitive-search.patch
        25 kB
        Dzmitry Kazimirchyk

        Activity

        Hide
        Ari Maniatis added a comment -

        From a conversation with Andrus, the following problem was identified: users using non-default collations *_bin or *_cs will have their behaviour changed by this change

        As it seems that any solution here is going to be a compromise, I guess it has to be implemented as a flag in MySQLAdapter that defines one or the other strategy. AutoAdapter can set this flag based on the database default collation:

        ==> show variables like "collation_database";

        If a user needs different behavior, they will have to set it manually per adapter.

        Alternatively, Cayenne modeler will need to be taught about collations so they can be stored against the dbcolumn. I don't think this is a MySQL specific problem, since collation affects the behaviour of other databases. It is just that MySQL doesn't give us a workaround with being able to create a functional index on UPPER(name).

        For what it is worth, we have this identical problem in WebObjects and need to find a similar fix there.

        Show
        Ari Maniatis added a comment - From a conversation with Andrus, the following problem was identified: users using non-default collations *_bin or *_cs will have their behaviour changed by this change As it seems that any solution here is going to be a compromise, I guess it has to be implemented as a flag in MySQLAdapter that defines one or the other strategy. AutoAdapter can set this flag based on the database default collation: ==> show variables like "collation_database"; If a user needs different behavior, they will have to set it manually per adapter. Alternatively, Cayenne modeler will need to be taught about collations so they can be stored against the dbcolumn. I don't think this is a MySQL specific problem, since collation affects the behaviour of other databases. It is just that MySQL doesn't give us a workaround with being able to create a functional index on UPPER(name). For what it is worth, we have this identical problem in WebObjects and need to find a similar fix there.
        Hide
        Ari Maniatis added a comment -

        I'd like to solve this one since it also affects other database types and is a bit of a performance issue for us right now. What about this as a solution:

        • Add new column to Cayenne modeler in the dbAttribute called "case sensitive". It is a checkbox which is ticked by default (to preserve existing behaviour).
        • Change the SQL generator to not spit out UPPER for those fields
        Show
        Ari Maniatis added a comment - I'd like to solve this one since it also affects other database types and is a bit of a performance issue for us right now. What about this as a solution: Add new column to Cayenne modeler in the dbAttribute called "case sensitive". It is a checkbox which is ticked by default (to preserve existing behaviour). Change the SQL generator to not spit out UPPER for those fields
        Hide
        Andrus Adamchik added a comment - - edited

        As recap of this thread on the dev list: http://markmail.org/message/pakh6dyzfsi5d2hr I am documenting a solution that we'll implement in 3.1:

        • A new configuration property will be introduced: "cayenne.runtime.likeclause.ci" (default false, can be set to true).
        • LIKE IGNORE CASE SelectQuery and EJBQL translators should read this property from RuntimeProperties service (meaning the property can be set either via DI or from command line with -Dcayenne.runtime.likeclause.ci=true) and skip conversion to UPPER if it is true.

        Since translator will still be defined by DbAdapter, some databases may chose to ignore this property.

        Show
        Andrus Adamchik added a comment - - edited As recap of this thread on the dev list: http://markmail.org/message/pakh6dyzfsi5d2hr I am documenting a solution that we'll implement in 3.1: A new configuration property will be introduced: "cayenne.runtime.likeclause.ci" (default false, can be set to true). LIKE IGNORE CASE SelectQuery and EJBQL translators should read this property from RuntimeProperties service (meaning the property can be set either via DI or from command line with -Dcayenne.runtime.likeclause.ci=true) and skip conversion to UPPER if it is true. Since translator will still be defined by DbAdapter, some databases may chose to ignore this property.
        Hide
        Ari Maniatis added a comment -

        To capture the opposite problem (cs search on ci column) , should cayenne.runtime.likeclause.ci have 3 states:

        • SAFE (we add UPPER and BINARY if the adapter allows it)
        • CI (we add BINARY if the adapter allows it)
        • CS (we add UPPER)
        Show
        Ari Maniatis added a comment - To capture the opposite problem (cs search on ci column) , should cayenne.runtime.likeclause.ci have 3 states: SAFE (we add UPPER and BINARY if the adapter allows it) CI (we add BINARY if the adapter allows it) CS (we add UPPER)
        Hide
        Andrus Adamchik added a comment -

        I think the purpose of this property is somewhat analogous to your suggestion of tagging individual DbAttributes as CI or CS. So use of BINARY should probably be implied for CI. So suggesting a different property naming:

        cayenne.runtime.db.collation.assume.ci = true | false

        If true: BINARY is used for CS search, UPPER is dropped for CI search
        If false (default): no change from current policy.

        Show
        Andrus Adamchik added a comment - I think the purpose of this property is somewhat analogous to your suggestion of tagging individual DbAttributes as CI or CS. So use of BINARY should probably be implied for CI. So suggesting a different property naming: cayenne.runtime.db.collation.assume.ci = true | false If true: BINARY is used for CS search, UPPER is dropped for CI search If false (default): no change from current policy.
        Hide
        Dzmitry Kazimirchyk added a comment -

        Attaching patch, which adds suppression of "upper" in QualifierTranslator basing on "cayenne.runtime.db.collation.assume.ci" runtime property.

        Show
        Dzmitry Kazimirchyk added a comment - Attaching patch, which adds suppression of "upper" in QualifierTranslator basing on "cayenne.runtime.db.collation.assume.ci" runtime property.
        Hide
        Dzmitry Kazimirchyk added a comment -

        Improvement of previous patch. It adds BINARY usage for MySQL CS queries Select and EJBQL when ci is true.
        Also I figured out that not only MySQL may have case insensitive databases: SQLServer and Derby have such dbs too. So, putting UPPER suppression to generic QualifierTranslator level probably has sense.

        Show
        Dzmitry Kazimirchyk added a comment - Improvement of previous patch. It adds BINARY usage for MySQL CS queries Select and EJBQL when ci is true. Also I figured out that not only MySQL may have case insensitive databases: SQLServer and Derby have such dbs too. So, putting UPPER suppression to generic QualifierTranslator level probably has sense.
        Hide
        Andrus Adamchik added a comment -

        Would be nice if we could test this somehow. Don't know how we can create a table with the right collation with unit test means that we have, but maybe at least create a regression test - running queries with alternative syntax without checking the results? Otherwise I think the patch can be applied.

        Show
        Andrus Adamchik added a comment - Would be nice if we could test this somehow. Don't know how we can create a table with the right collation with unit test means that we have, but maybe at least create a regression test - running queries with alternative syntax without checking the results? Otherwise I think the patch can be applied.
        Hide
        Ari Maniatis added a comment -

        If changing the property could be done as a command line parameter to the maven test suite, then we could easily run ALL the tests in Jenkins with the switch both ways, and against all databases. We'd also want (obviously) to switch the collation of the test databases based on this parameter as we run through the tests.

        Show
        Ari Maniatis added a comment - If changing the property could be done as a command line parameter to the maven test suite, then we could easily run ALL the tests in Jenkins with the switch both ways, and against all databases. We'd also want (obviously) to switch the collation of the test databases based on this parameter as we run through the tests.
        Hide
        Dzmitry Kazimirchyk added a comment -

        Committed last patch with some changes (due to CAY-1618). Speaking about testing: everything works as expected on my local environment. And it will be great if we can run tests on Jenkins with different dbs and collations.

        Show
        Dzmitry Kazimirchyk added a comment - Committed last patch with some changes (due to CAY-1618 ). Speaking about testing: everything works as expected on my local environment. And it will be great if we can run tests on Jenkins with different dbs and collations.
        Hide
        Ari Maniatis added a comment -

        Dzmitry, can we pass -Dcayenne.runtime.db.collation.assume.ci=true on the command line? If not, could you make it possible, then I'll add that to the Jenkins tests.

        Show
        Ari Maniatis added a comment - Dzmitry, can we pass -Dcayenne.runtime.db.collation.assume.ci=true on the command line? If not, could you make it possible, then I'll add that to the Jenkins tests.
        Hide
        Dzmitry Kazimirchyk added a comment -

        Committed the changes which will enable us to do this. Anyway, we still need to configure Jenkins to work with our new version naming pattern: http://markmail.org/message/m4x3lxtcoxlsi6r2

        Show
        Dzmitry Kazimirchyk added a comment - Committed the changes which will enable us to do this. Anyway, we still need to configure Jenkins to work with our new version naming pattern: http://markmail.org/message/m4x3lxtcoxlsi6r2
        Hide
        Ari Maniatis added a comment -

        Dima, is this task all done other than adding some testing to Jenkins? If so, I think we should close the issue.

        Show
        Ari Maniatis added a comment - Dima, is this task all done other than adding some testing to Jenkins? If so, I think we should close the issue.
        Hide
        Andrus Adamchik added a comment -

        Both 3.1 and 3.2 support 'cayenne.runtime.db.collation.assume.ci'. It was committed on Sept 14, 2011, i.e. after the patches. So I added it ti old release notes, and now closing the issue. If there's anything else we need to do, let's jira it separately.

        Show
        Andrus Adamchik added a comment - Both 3.1 and 3.2 support 'cayenne.runtime.db.collation.assume.ci'. It was committed on Sept 14, 2011, i.e. after the patches. So I added it ti old release notes, and now closing the issue. If there's anything else we need to do, let's jira it separately.

          People

          • Assignee:
            Ari Maniatis
            Reporter:
            Ari Maniatis
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development