Apache OpenOffice (AOO) Bugzilla – Issue 95668
RTF: parenthesis in RTL mode imported as LTR character
Last modified: 2017-05-20 11:15:27 UTC
When the attached Hebrew RTF document is imported, the trailing parenthesis is misplaced. See screenshots. Note in the top line how the ending parenthesis has moved from the left edge to the right edge. Note also that the image on the top left was lost on import, and that the gray color in the logo on the top lines has become black.
Created attachment 57587 [details] bugdoc
Created attachment 57588 [details] How it looks in MS Word
Created attachment 57589 [details] How it looks in Writer
One solution may be using the method described in issue 18024, namely adding an RLM after trailing weak chars. I used such a method when fixing a similar PowerPoint export bug for Issue 39516.
MRU->HBRINKM: see attached document, in the header at the beginning of the document, Writer imports one of the parenthesis as an LTR character, Formatting this manually to RTL will show this correctly at the end of the paragraph.
Although there is the difference in the import results, examining the bugdoc reveals that the problematic paragraph has LTR direction. So this is definitely not an RTF import issue. The difference is that word displays the trailing parentheses "correctly" even when the paragraph direction is "wrong". The way the parentheses are displayed is the expected behavior for LTR paragraphs according to writer's existing functionality.
hennerdrewes is correct. On the one hand, we want the document to like like it does in Word. On the other, do we want to imitate Word's bizarre features? If the next version of Word fixes their quirk, then we'll have to remember to fix it back here, too.
@ayaniger: We really should look at this problem in a broader scope, as different similar issues came up in the past. Because of the additional directional information that MS uses, similar problems may occur, if there are weak characters on the boundary of RTL and LTR runs, and the paragraph direction of OOo flips them to the opposite direction. So theoretically speaking, there are really a lot of cases where this matters. On the other hand in practice, cases like the one in the current bugdoc with trailing weak characters are probably the most common. To be most precise, the import (and this is probably necessary for all MS format imports, rtf, doc, docx etc, and even copy/pasting text) should evaluate the extra directional information and compare it to our bidi behavior. In simple cases like the current one, the best fix would be to import the paragraph as RTL, because this is what is logically intended. (It is just badly formatted.) If possible, this should be preferred to cluttering the text with RLMs. In more exotic cases with changes between LTR and RTL runs, it might be inevitable to insert RLMs or LRMs. To sum up, probably there will be need to clarify this translation process generally, before it could be implemented into different filters. But this could dramatically improve the import performance of OOo.
Reset assigne to the default "issues@openoffice.apache.org".