Apache OpenOffice (AOO) Bugzilla – Issue 100737
Signed numbers displayed incorrectly in RTL Calc
Last modified: 2017-05-20 11:11:36 UTC
To reproduce the bug: a) set SAL_RTL_ENABLED to ”TRUE” b) start OOo and open a spreadsheet c) type “-3” in a cell d) the number appears as “3-”
This is a regression bug. The behavior is fine in 2.3.1. Beginning in 2.4, the bug appears.
Has anyone looked at this bug? This is a very serious problem for RTL users. In Impress, "-3" appears ok if the text direction is LTR. Change the text dir to RTL, and you'll see "3-". In Calc, it shows up as "3-" even if the text dir is LTR, as long as the UI language is RTL. Is this a problem in the edit engine? Is it a problem in the calc code? It seems that the regression occured between 2.3.1 and 2.4? Where in the code should I starts to look in order to fix it? TIA for feedback.
On which platforms does this occur? Maybe Unix/Linux only or is it Windows as well?
confirmed on OOO310_m9 on UNX
Hi Niklas, please have a look
ICU's BiDi algorithm claims that the correct visual order for signed numbers in RTL contexts is digits_left sign_right (e.g. 123+). Writer also does it that way, but the EditEngine is different (don't know why yet. TL? ).
This may be related to the change in the bidi type of minus hyphen from Unicode 4.0 to Unicode 4.0.1, as described in issue 57833. Eike wrote that as of m197, we're using Unicode 5.0.0.
Though Writer places the sign to the right, and the digits to the left, popular usage (at least for Hebrew) is otherwise. A user who types the number "-3" generally expects to the the sign on the left. At present, in Calc, the sign is always to the right, and it can't be changed without using formatting characters.
-->tl: This occurs on both Linux and Windows.
So, according to the latest unicode standard and to issue 57833 the numbers are layouted perfectly. If the popular usage of signed numbers in RTL-enabled spreadsheet applications is different from the unicode standard, then Calc's number formatter should be adjusted to it. Maybe by inserting BiDi- markers, maybe by using related plus-minus codepoints that have more matching BiDi-properties.
I guess this is _only_ for Hebrew, isn't it? Not other RTL locales?
we should ask some native speakers...
Here is a patch which almost solves the problem for Hebrew, and leaves handling for other RTL languages as it is. It sets all of the number formats for Hebrew to display the minus sign to the left, by adding an LRM and a minus sign before a negative number. Unfortunately, I can't modify the number format "Default", which will still display the minus sign to the right. For the Hebrew version, we can deal with this in an ugly way by including a default template which sets the default number format to a format other than "Default". But I would prefer if there was a way that I could get the format "Default" to also display the minus sign to the left. Suggestions are welcome.
Created attachment 61982 [details] Patch for Hebrew number formats
In arabic, the algebraic sign is like in french or german always on the left side of the number, e.g. "-3", "-12321", NEVER "3-". Arabic as script is RTL, but numbers in the text including algebraic signs are "LTR", so the number is always on the left side of the number. http://ar.wikipedia.org/wiki/%D8%A3%D8%B9%D8%AF%D8%A7%D8%AF_%D8%B3%D8%A7%D9%84%D8%A8%D8%A9_%D9%88%D9%85%D9%88%D8%AC%D8%A8%D8%A9
ayaniger->farzanehs: Does Persian place the minus sign to the right of a number or to the left?
-->cmu Here's a generic RTL patch, which will work for Arabic as well as Hebrew. It supersedes the previous Hebrew patch I posted. -->er So far, we know that Hebrew and Arabic place the minus sign to the left, and I'm still waiting for answers from the Urdu, Thai, and Persian project leads. I'll post the patch, so it can be integrated when you think we have enough of a consensus.
Created attachment 62363 [details] In RTL OOo, places an LRM before an opening minus sign in number formats
Thai is LTR CTL so this issue does not apply.
Behdad Esfahbod has written me that Persian, like Hebrew and Arabic, places the minus sign on the left. I think we have enough of a consensus now to put the minus sign on the left by default for RTL, and use the patch I've posted.
added the PATCH flag
Though the patch in the number formatter (or the modified format codes) may cure the primary symptom I doubt it is what we want, it might create new problems. Preceding already the raw formatted string with LRM would not only insert the LRM for display purposes, but would also include it in every other string operation, such as copy&paste via clipboard and writing to document files. Parsing such string may or may not work, depending on whether the target application ignores a LRM. Resetting issue from PATCH to DEFECT for this reason. Instead, the LRM could be inserted only if the string is to be displayed or printed. For Calc, that could be in ScDrawStringsVars::SetText() if the original cell data is numeric, I guess.
If the change is to made for each application where the symptom occurs, this would have to be fixed not only in calc, but also in Writer tables with number recognition. I'm wondering though how serious a concern it should be that a target application would ignore an LRM. Wouldn't it be fair to assume that any application which would be Unicode-aware enough to reverse the order of "-3" because of the new Unicode Bidi properties of minus/hyphen would be Unicode-aware enough to support an LRM?
> If the change is to made for each application where the symptom occurs, this > would have to be fixed not only in calc, but also in Writer tables with number > recognition. Yes. And Impress tables, Chart data tables, and maybe more. The actual logic and insertion could be still made in the number formatter, with the method getting passed an argument whether it should insert a LRM. > I'm wondering though how serious a concern it should be that a target > application would ignore an LRM. Wouldn't it be fair to assume that any > application which would be Unicode-aware enough to reverse the order of "-3" > because of the new Unicode Bidi properties of minus/hyphen would be > Unicode-aware enough to support an LRM? Not necessarily. I wasn't referring to reversal of "-3". For a simple example take the export of numbers to CSV, you'd have a field content with <LRM>-3. An application parsing that file may not recognize the data being numeric if it does not ignore the LRM character. When copying data via clipboard, the pasting application inserts data as is, in an RTL environment you'd end up with the logical sequence ABC <LRM>Data DEF. Now, how is that supposed to be displayed? I don't think the LRM belongs into data because it is a mere display option.
This problem still appears at OpenOffice.org 3.2.1 (OOO320m19).
Reset assigne to the default "issues@openoffice.apache.org".