The following code, which uses the regexp1.2 package, gets different results when running on windows NT and linux, both running the javasoft jdk 1.3. The problem seems to be with the handling of \n in multiline matches. See the comment at the beginning of the class for some more detail. --- BadRE.java --- import org.apache.regexp.*; // The following class defines a regular expression and attempts to // match a text string with it. The regular expression is trying to // match the literal "window.location.href=" at the beginning of a // line, following any number of space characters. // // The results are different on Windows NT and Linux. On linux // running sun jdk1.3, it matches. On Windows NT Workstation running // sun jdk1.3, it doesn't match. // // If the \n is removed from the beginning of the input string then it // matches on windows and linux. // If bol (beginning-of-line) is changed from "^[ \t]*" to "^[ \t\n]*" then // it matches on windows and linux. // If bol is changed to "^\n[ \t]*" then it matches on windows and linux. // public class BadRE { public static RE makeRE() throws RESyntaxException { String bol = "^[ \t]*"; String regexp = bol + "window.location.href="; RE matchRE = new RE(regexp, RE.MATCH_MULTILINE | RE.MATCH_CASEINDEPENDENT); return matchRE; } public static void test() throws RESyntaxException { String input = "\nwindow.location.href="; RE re = makeRE(); if (re.match(input)) { System.out.println("match: " + re.getParen(0)); } else { System.out.println("no match"); } } public static void main(String [] args) throws RESyntaxException { test(); } }
*** Bug 4183 has been marked as a duplicate of this bug. ***
I think the problem is that on Linux/Unix line.separator=="\n", but on Windows it is "\r\n". Thus on Linux we consider string which we match as to line text (and last line matches to regexp), but on Window this is one line text and this line diesn't match to regexp. To correct test we should use String input = System.getProperty("line.separator") + "window.location.href="; So, I would say that the test is incorrect.
Created attachment 9656 [details] Suggested fix for the bug
Nice patch! I'll test it and apply as soon as I have a bit of time... (ps: your patches look different... do you/can you use "diff -u"?)
> To correct test we should use > String input = System.getProperty("line.separator") + "window.location.href="; The input comes from a remote web server, so the newline sequence used in the data is not related to the platform that the code is running on.
Created attachment 9746 [details] Suggested fix in unified format.
Created attachment 9872 [details] Additional patch: OP_ANY (.) did only check for \n but should use new method isNewline
Oleg, I'd added following testcase: r = new RE("^a.*b$", RE.MATCH_MULTILINE); if (!r.match("a\nb")) { fail("\"a\\nb\" doesn't match"); } if (!r.match("a\rb")) { fail("\"a\\rb\" doesn't match"); } if (!r.match("a\r\nb")) { fail("\"a\\r\\nb\" doesn't match"); } if (!r.match("a\u0085b")) { fail("\"a\\u0085b\" doesn't match"); } if (!r.match("a\u2028b")) { fail("\"a\\u2028b\" doesn't match"); } if (!r.match("a\u2029b")) { fail("\"a\\u2029b\" doesn't match"); } And two of them fail: [java] "a\nb" doesn't match [java] "a\r\nb" doesn't match Do you have a suggestion what's wrong here? Hendrik, With your patch and test above, several tests fail. Vadim
Oops, I got it wrong. '.' should not match new line in MULTILINE mode. Correct test is: // Test MATCH_MULTILINE. Test that '.' does not mathces new line. r = new RE("^a.*b$", RE.MATCH_MULTILINE); if (r.match("a\nb")) { fail("\"a\\nb\" matches \"^a.*b$\""); } if (r.match("a\rb")) { fail("\"a\\rb\" matches \"^a.*b$\""); } if (r.match("a\r\nb")) { fail("\"a\\r\\nb\" matches \"^a.*b$\""); } if (r.match("a\u0085b")) { fail("\"a\\u0085b\" matches \"^a.*b$\""); } if (r.match("a\u2028b")) { fail("\"a\\u2028b\" matches \"^a.*b$\""); } if (r.match("a\u2029b")) { fail("\"a\\u2029b\" matches \"^a.*b$\""); } And Hendrik's patch is working ok. Vadim
Patches applied, thanks to everybody. Vadim