Apache OpenOffice (AOO) Bugzilla – Issue 46165
Regular expressions work inconsistently or not at all when combined.
Last modified: 2013-08-07 14:38:26 UTC
I know that with "^$" I can find empty paragraphs, but it is not possible to find sequences, like "^$^$", in the case of wanting 2 empty paragraphs next each other.
Reassigned
as described
The feature that I miss most in OO is an ability to "Search and Replace" the new paragraph, new line, or <CR> character. If you search for the regular expression \n, you can find "soft new line" or <Shift><Enter> and can also replace it with a hard return or <Enter> but as far as I can tell, you cannot search for the <Enter> in a document.
Also (and very similar to this issue) Not enough capability in REGEX (regular expression) capability: - can't search for /text to be italic/ or *text to be bold* (No ability to address content of match, only whole match) - can't replace text(space)(newline)text with text(space)text (REGEX only understands 'this line/paragraph' as search target.) WORK-AROUND: cut/paste to other application (I use EditPad Pro), fix text using REGEX search/replace, then cut/paste back. - current method: find [ ^$ ], replace [(nothing)] also has problems (will ALSO remove any page-break attached to blank lines) - find / replace (only change attributes) is also unsafe (find [(REGEX expression)], replace [&], bold/italic will sometimes result in '&' (the character), replacing target)
*** Issue 75214 has been marked as a duplicate of this issue. ***
SBA: I adjusted the summary to reflect the general problem. From issue 75214, there are more examples: (1) "\n" and "\t" do not work in square brackets (2) "^" works differently in square brackets (3) "\n" finds a line break but inserts a paragraph break I do not regard this as a collective issue. The many existing issues about Regular Expressions show that this area needs a "general rework" so that regular expressions may become a consistent, powerful and INTUITIVE tool for the user.
*** Issue 70554 has been marked as a duplicate of this issue. ***
> .. this area needs a "general rework" so that regular expressions may > become a consistent, powerful and INTUITIVE tool for the user. Have been considering a spec. for just that, and will try and refine it to post as attachment to this issue. Is there any consensus on what level we need to try to target: eg. (a) expert regex user, (b) journeyman regex, OR (c) basic regex + 'non-regex' wrapper/wizard for neophytes. by (b) I mean: ".. not as good as say EditPad Pro , but better than that other rival product."
For (c) we have RFE http://www.openoffice.org/issues/show_bug.cgi?id=63074 already. We also have RFE http://www.openoffice.org/issues/show_bug.cgi?id=28913
When I checked, there were no non-greedy patterns.
"My" issue, http://www.openoffice.org/issues/show_bug.cgi?id=70554 was closed as a duplicate of this one. I've been looking at http://www.openoffice.org/issues/show_bug.cgi?id=15666, which looked promising, but now doesn't seem to address the problems in this issue or 7054. In short, what issue 70554 is about, is giving an ordinary user the ability to - search and find line breaks, any kind - search and find paragraph breaks, any kind - substitute any of the above, be it one or many with one or many of any combination of the above. Some *potentially* interesting info from drking in 15666 (which is targeted for 2.4): "The good news is that if OOo migrates to the ICU regex engine, many of the existing issues may be resolved at a stroke. Although (looking at the ICU regex spec) probably not all of them." How likely this is to happen, or if it indeed solves this issue, I don't know. Is there a "general rework" in progress? What issue should one look at to see what's up, if any? @SBA: in 70554 you say "You can as well produce an entire specification (better use the spec template :-) and attach it to that issue.". What do you mean by spec template? If there's something I am able to do that may contribute to getting this fixed, I'll do what I can, of course.
Reassigned to ama
added myself as cc
Since I can't find an answer to the question I asked below, and the behaviour in OOo 2.4 has not improved for e. g. substituting line breaks but is still broken, I ask again: @SBA: in 70554 you say "You can as well produce an entire specification (better use the spec template :-) and attach it to that issue.". What do you mean by spec template? If there's something I am able to do that may contribute to getting this fixed, I'll do what I can, of course.
Experimenting with 2.4 in conjunction with issue 15666 seems to give some hints on two small improvements that may perhaps be possible for reducing the impact of not being able to search for ^$ in conjunction with any string at all: 1. Use \r for inserting a paragraph mark/paragraph break instead of \n 2. Use \n for inserting a newline/line break Rationale: - OOo 2.4 can insert newlines, try a regex search and replace searching for \n (any number) replacing it by & or $0 ($n if () are used in the search expression) - OOo 2.4 can insert paragraph breaks already, but uses the wrong (illogical) expression for it, \n, which is already used for newlines. If this part of the code today is remotely like what it AFAIR was in 1.x, using \r instead of \n for inserting a paragraph break seems to be a matter of replacing the string \n by \r in the corresponding places of the code (and the help files). Let's hope a real programmer beats me to finding, downloading and messing up a current snapshot of the code, although that part seems to be the easy part, after looking at this: http://specs.openoffice.org/ http://wiki.services.openoffice.org/wiki/Specification http://specs.openoffice.org/collaterals/template/2.0/OpenOffice-org-Specification-Template.ott http://wiki.services.openoffice.org/wiki/The_Three_Golden_Rules_for_Writing_OpenOffice.org_Specifications (etc. etc. etc. etc....)
On my OO.org 2.4 on Win32 (Italian translation) the regex do not work correctly; if you put in the search box [A-Z][a-z]* and in the replace box & the replaced words have a "x" appended; e.g. this: Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Maecenas tincidunt metus vitae tortor. Sed elementum luctus diam. Etiam lorem. becomes Loremx ipsum dolor sit amet, consectetuer adipiscing elit. Maecenasx tincidunt metus vitae tortor. Sedx elementum luctus diam. Etiamx lorem. . Note that even if you change around the expression, e.g. putting $0 instead of &, or ([A-Z][a-z]*) and $1 the bug do not disappear. With some different regexes (but still with "&" in the replace box) I also found that the text was replaced with the searched string only few times, and then with an ampersand (&); changing the & in a $0 didn't solve the problem (the words were replaced with $0). The issue seems to be Win32-only (I reproduced it on Windows XP SP3 and Windows 2000 SP4), since on Linux (Ubuntu 8.04) these regexes work fine.
OK, it has been fixed in 2.4.1.
SBA: Put myself on c/c.
*** Issue 84828 has been marked as a duplicate of this issue. ***
I'd like to support this issue, specially gudmund's improvement suggestion: "1. Use \r for inserting a paragraph mark/paragraph break instead of \n 2. Use \n for inserting a newline/line break" That would be a good step forward. But I'm also convinced, as sba wrote: "The many existing issues about Regular Expressions show that this area needs a "general rework" so that regular expressions may become a consistent, powerful and INTUITIVE tool for the user." - They are not yet!
*** Issue 76634 has been marked as a duplicate of this issue. ***
*** Issue 69534 has been marked as a duplicate of this issue. ***