Apache OpenOffice (AOO) Bugzilla – Issue 42171
Display of invalid Thai combining character sequences broken on Windows
Last modified: 2017-05-20 11:29:19 UTC
A combining character sequence such as gor gai+mai ek+sara ii (0e01+0e48 +0e35) is not displayed properly on Windows. It should be displayed as gor gai with the mai ek and then dotted circle with sara ii. It *is* displayed in this way in Linux. On Windows, with the old Windows Thai fonts, such as Angsana and Browalia, an ugly black box is show, and it is not clear that there is a sara ii there. Much more seriously, with more recent fonts such as Tahoma, the sara ii does not show up at all. The combining character sequences that are not displayed properly are sequences that Windows cannot display in a single cell. Such sequences never occur in correct Thai. Conventionally, most applications on Windows prevent the input of such invalid sequences. However, OOo does not always do this and it is anyway possible for such sequences to occur in imported data. It is important that such sequences be highly visible to the user so that the user can correct them. Test case: 1) Load the attached document (with invalid combining character sequences) on Linux. The display use dotted circles to ensure that all combining characters in invalid combining character sequences are clearly displayed. See the first screenshot attached. 2) Load the same document on Windows. You'll not see any dotted-circle. See the second screenshot. So you'll not know that this document has errors in it. 3) Reformat the document to use the font Angsana (or Browallia or other Windows Thai fonts). You'll see black boxes where there are invalid combining character sequences. See the third screenshot. This let you know that there're errors but you can't tell what the error is. Using Tahoma or Microsoft Sans Serif or Lucida Sans Unicode (which have the glyph for dotted circle) instead, and there are no black boxes but there are no dotted circle either.
Created attachment 22278 [details] Text document with invalid Thai combining character sequences
Created attachment 22279 [details] Screenshot of the document displayed correctly on Linux
Created attachment 22280 [details] Screenshot of the document displayed on Windows
Created attachment 22281 [details] Screenshot of the document displayed on Windows, reformat to use Angsana
Hi Karl, seems for some reason that the iterator is broken (only under Windows?). Can you please check if this can bwe fixed or if this is a font specific matter (just a wild guess, though)?. Thx in advance.
Karl: This is not a breakiterator issue, but layout engine issue. Linux and Window use different engines, Window uses native Uniscribe while Linux use ICU layout engine. For preventing entering invalid sequence, we do have input sequence checking, but it was broken. I will create a new issue to fix broken input sequence checking and transfer this one to Herbert for fixing layout engine.
Can reproduce.
Unfortunately we are 100% compatible here with an important legacy application from a major competitor, because we use the same layout engine... so the problem is in the Uniscribe library which is outside OOo's scope. Thanks for the great bugdocs and the excellent bug report which made reproducing the problem easy.
Thanks for looking into this issue. So if I understand correctly, the situation is that: a) Uniscribe has a bug/limitation that it displays invalid combinining character sequences poorly b) OOo sometimes gives Uniscribe invalid combining character sequences to display I don't think it follows from this that nothing needs changing in OOo. For example, if the document contains 0e01+0e48+0e35, which Uniscribe cannot display properly, the OOo display engine might transform that to 0e01+0e48+25cc+0e35 before giving it to Uniscribe to display. Alternatively the Sequence Input Checking could be made more vigorous on Windows so that it is impossible for the user to enter such invalid sequences (which I believe is the case with some competitor products). The current situation may well be Uniscribe's fault, but it is not an acceptable situation for OOo Thai users on Windows, and I find it hard to believe that there is nothing OOo can do to improve the situation.
Ok, it is possible to workaround the issue by changing invalid sequences to valid ones.
HDU->FME: please work with Karl to convert invalid character sequences into valid ones...
FME->FT: And finally back to you. I think this means we should implement a "type and replace" feature for sequence input checking, as know from a competitor. In this case we need a more detailed desciption of the functionality of this feature.
.
"Type and replace" is issue 42661. That's is a separate (although related) issue. "Type and replace" is about how to prevent invalid combining character sequences getting into your document. The issue here is what happens if your document contains an invalid combining character sequence; that can happen when you load a document or when you turn off sequence input checking and "type and replace". In order to display invalid combining character sequences with Uniscribe, it is necessary to transform invalid combining character sequences to sequences that can be displayed by Uniscribe (e.g. by inserting dotted circle glyphs) as part of the display process; this wouldn't change the logical content of the document which would still contain invalid combining character sequences.
I'm wondering why Uniscribe doesn't support displaying invalid combining character sequence. It is said here http://www.microsoft.com/typography/otfntdev/thaiot/shaping.aspx#comb and http://www.microsoft.com/typography/OpenType%20Dev/arabic/shaping.mspx#invalid and http://www.microsoft.com/typography/OpenType%20Dev/lao/shaping.mspx#invalid Maybe it is implemented in every CTL languages mentioned here http://www.microsoft.com/typography/SpecificationsOverview.mspx
FT: Back to you Samphan. For the moment I do not see that we can do such thing without the help from the outside. Please provide spec and patch/code first. please do ont assign this issue to me again since I'm leaving this position. thx
any Windows user can confirmed if this still occurs in the latest OOo ?
Reset assigne to the default "issues@openoffice.apache.org".