Derby
  1. Derby
  2. DERBY-1386

Wrong results with query using LIKE and ESCAPE clause that includes "%", "\", and "_"

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 10.1.3.1, 10.2.1.6
    • Fix Version/s: 10.1.3.1, 10.2.1.6
    • Component/s: SQL
    • Labels:
      None
    • Issue & fix info:
      Patch Available
    • Bug behavior facts:
      Regression

      Description

      After the fix for DERBY-1262 was checked in, I'm noticing that the following query now returns different results. Prior to the fix for DERBY-1262 the query returned 2 rows; now it doesn't return any rows.

      create table escTable (c1 char(10));
      insert into escTable values ('%_\a');
      insert into escTable values ('%_b');
      insert into escTable values ('%c');
      insert into escTable values ('d');
      insert into escTable values ('%_\e');
      select c1 from escTable where c1 like '%_
      %' ESCAPE '\';

      Before DERBY-1262, the SELECT returned:

      C1
      ----------
      %_\a
      %_\e

      2 rows selected

      Now it returns:

      C1
      ----------

      0 rows selected

      Brief inspection of the query and data suggest to me that these new results (i.e. no rows) are wrong, and that Derby should in fact return 2 rows/.

      Based on comments in DERBY-1262, I'm creating a new Jira issue for the regression since it has been checked into the 10.1 maintenance branch. I've set the priority to "Critical" since this could potentially delay a 10.1.3 release--I.e. I don't think we'd want to release 10.1.3 knowing that we have a wrong results regression. But if anyone thinks that's not the correct priority, feel free to speak up.

      Other option, of course, is to back out the change for DERBY-1262 in 10.1 and then lower the priority accordingly.

      Input/feedback/comments would be appreciated.

      1. derby-1386-v1.stat
        0.2 kB
        Knut Anders Hatlen
      2. derby-1386-v1.diff
        9 kB
        Knut Anders Hatlen

        Issue Links

          Activity

          Hide
          A B added a comment -

          Oops, forgot to change the priority to Critical, per my comments in the description.

          Show
          A B added a comment - Oops, forgot to change the priority to Critical, per my comments in the description.
          Hide
          Knut Anders Hatlen added a comment -

          I see the same results. However, when I use like to test the same values as constants, it correctly evaluates to true:

          ij> select * from sysibm.sysdummy1 where '%_\a' like '%_
          %' escape '\';
          IBM&


          Y

          1 row selected

          Show
          Knut Anders Hatlen added a comment - I see the same results. However, when I use like to test the same values as constants, it correctly evaluates to true: ij> select * from sysibm.sysdummy1 where '%_\a' like '%_ %' escape '\'; IBM& Y 1 row selected
          Hide
          Knut Anders Hatlen added a comment -

          There is a bug in the unoptimized version of
          Like.lessThanString(). The bug was not introduced by DERBY-1262, just
          exposed by it. I will try to fix it in trunk and 10.1.

          lessThanString() searches for the first occurrence of % or _ in the
          pattern string, and if the preceding character is an escape character,
          it is skipped. This leads to incorrect behaviour when % or _ is
          preceded by an escaped escape character.

          Show
          Knut Anders Hatlen added a comment - There is a bug in the unoptimized version of Like.lessThanString(). The bug was not introduced by DERBY-1262 , just exposed by it. I will try to fix it in trunk and 10.1. lessThanString() searches for the first occurrence of % or _ in the pattern string, and if the preceding character is an escape character, it is skipped. This leads to incorrect behaviour when % or _ is preceded by an escaped escape character.
          Hide
          Knut Anders Hatlen added a comment -

          Attached a patch (derby-1386-v1.diff) which removes the old, unused
          version of lessThanString (the so-called optimized version) and
          rewrites the unoptimized version.

          First a description of what lessThanString() is supposed to do
          (somewhat simplified):

          When an SQL statement is compiled, LIKE expressions are rewritten to
          make them more efficient. For instance,

          SELECT * FROM T WHERE X LIKE 'abc_ef%'

          is rewritten to

          SELECT * FROM T WHERE X >= 'abc' AND X < 'abd' AND X LIKE 'abc_ef%'

          (Well, 'abc' and 'abd' will actually be padded with nulls up to the
          max length of X, but you get the point...)

          Rewriting the statement like this is done to reduce the number of LIKE
          comparisons (which are more expensive than >= and <) and to make it
          possible to use the index.

          The role of lessThanString() is to generate the string used in the
          less than expression ('abd' in the example above). This string is the
          sub-string leading up to the first wildcard, but with the last
          character incremented by one (and, again, the string is padded with
          nulls).

          Currently, lessThanString() does this:

          1) Find the first wildcard character, if it is preceded by an escape
          character go to the next wildcard. (The error is in this step,
          because the escape character can be escaped.)

          2) Take the character right before the first wildcard and increment
          it.

          3) Build a new string of the pattern string up to the incremented
          character, but without the escape characters.

          4) Pad the string with nulls and return it.

          With the patch, it does the following:

          1) Build a new string based on the pattern string by walking the
          pattern string, skipping escape characters and stopping when the
          first unescaped wildcard is found.

          2) Increment the last character of the newly built string.

          3) Pad the string with nulls and return it.

          Other changes in the patch:

          • the unused version of lessThanString() was removed
          • some comments about what lessThanString() does were changed since
            the code didn't do what they said
          • new test cases in lang/dynamicLikeOptimization.sql

          Derbyall ran without errors on Sun JVM 1.5.0 / Solaris 10 x86.

          Reviews would be most appreciated! Thanks!

          Show
          Knut Anders Hatlen added a comment - Attached a patch (derby-1386-v1.diff) which removes the old, unused version of lessThanString (the so-called optimized version) and rewrites the unoptimized version. First a description of what lessThanString() is supposed to do (somewhat simplified): When an SQL statement is compiled, LIKE expressions are rewritten to make them more efficient. For instance, SELECT * FROM T WHERE X LIKE 'abc_ef%' is rewritten to SELECT * FROM T WHERE X >= 'abc' AND X < 'abd' AND X LIKE 'abc_ef%' (Well, 'abc' and 'abd' will actually be padded with nulls up to the max length of X, but you get the point...) Rewriting the statement like this is done to reduce the number of LIKE comparisons (which are more expensive than >= and <) and to make it possible to use the index. The role of lessThanString() is to generate the string used in the less than expression ('abd' in the example above). This string is the sub-string leading up to the first wildcard, but with the last character incremented by one (and, again, the string is padded with nulls). Currently, lessThanString() does this: 1) Find the first wildcard character, if it is preceded by an escape character go to the next wildcard. (The error is in this step, because the escape character can be escaped.) 2) Take the character right before the first wildcard and increment it. 3) Build a new string of the pattern string up to the incremented character, but without the escape characters. 4) Pad the string with nulls and return it. With the patch, it does the following: 1) Build a new string based on the pattern string by walking the pattern string, skipping escape characters and stopping when the first unescaped wildcard is found. 2) Increment the last character of the newly built string. 3) Pad the string with nulls and return it. Other changes in the patch: the unused version of lessThanString() was removed some comments about what lessThanString() does were changed since the code didn't do what they said new test cases in lang/dynamicLikeOptimization.sql Derbyall ran without errors on Sun JVM 1.5.0 / Solaris 10 x86. Reviews would be most appreciated! Thanks!
          Hide
          A B added a comment -

          I reviewed this patch and the changes look correct to me. The old code matches the problem that Knut identified and the new code addresses the issues and does what is described in the previous comment. I also verified that the repro passes with the patch applied, and that the new test cases fail without the patch and pass with it.

          I haven't run derbyall myself but since Knut ran it and said everything passed, I vote +1 to have this committed.

          Thanks again for volunteering to address this, Knut, and for doing it so quickly.

          Show
          A B added a comment - I reviewed this patch and the changes look correct to me. The old code matches the problem that Knut identified and the new code addresses the issues and does what is described in the previous comment. I also verified that the repro passes with the patch applied, and that the new test cases fail without the patch and pass with it. I haven't run derbyall myself but since Knut ran it and said everything passed, I vote +1 to have this committed. Thanks again for volunteering to address this, Knut, and for doing it so quickly.
          Hide
          Knut Anders Hatlen added a comment -

          Thanks Army for reporting the issue and reviewing the patch so quickly that we could make it before the 10.1.3 code freeze.

          I have now run derbyall on 10.1 (Sun JVM 1.5.0, Solaris 10 x86) with two failures. One is T_Diagnosticable, which Andreas just reported that he has seen on Solaris as well, and the other one is testSecMec, which fails because the message text is slightly different from the canon.

          Committed the patch into trunk with revision 413123 and into 10.1 with revision 413124. I have verified that the example provided by Army returns the correct results on both trunk and 10.1.

          Show
          Knut Anders Hatlen added a comment - Thanks Army for reporting the issue and reviewing the patch so quickly that we could make it before the 10.1.3 code freeze. I have now run derbyall on 10.1 (Sun JVM 1.5.0, Solaris 10 x86) with two failures. One is T_Diagnosticable, which Andreas just reported that he has seen on Solaris as well, and the other one is testSecMec, which fails because the message text is slightly different from the canon. Committed the patch into trunk with revision 413123 and into 10.1 with revision 413124. I have verified that the example provided by Army returns the correct results on both trunk and 10.1.
          Hide
          A B added a comment -

          Fix verified in 10.1 and 10.2, so I'm marking it closed. Thanks Knut Anders!

          Show
          A B added a comment - Fix verified in 10.1 and 10.2, so I'm marking it closed. Thanks Knut Anders!

            People

            • Assignee:
              Knut Anders Hatlen
              Reporter:
              A B
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development