Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Invalid
-
1.8.1
-
None
Description
For the purpose of determining the position in text, the Japanese characters U+30fc (KATAKANA-HIRAGANA PROLONGED SOUND MARK) and U+3005 (IDEOGRAPHIC ITERATION MARK) are currently regarded "simple" diacritics. Apparently, they are fully-fledged characters in terms of text positioning.
This can have the effect that when extracting text, some characters get actually reversed (particularly ーン can get ンー).
A patch to fix this is attached.