Apache OpenOffice (AOO) Bugzilla – Issue 61958
Fix UTF8 filaccess (and filename display) problems on Mac OS X
Last modified: 2006-07-13 16:02:59 UTC
There have been some effort to fix UTF8 problems on Mac OS X with regard to reading, writing files and directories with UTF8 characters in their names. However, there are still problems in m156 (Maho's build) of OpenOffice.org X11 for Mac OS X. 1) The filenames are not displayed correctly (two weird characters instead of one correct UTF8 character). See screenshot below 2) Saving a document with a new filename that contains UTF8 characters is not possible, because OOo does not accept the filename as a valid name. ... The characters used in testing are the scandinavian a and o with dots: ä and ö.
The web url above points to Patrick Luby's donated code that may help in fixing this issue.
Created attachment 34097 [details] Screenshot of broken UTF8 chars in OOo
Created attachment 34692 [details] Updated Patrick patch that fixes the access and display problems of non-ascii file and pathnames
The attached patch fixes the issues with non-ascii filenames/paths in OOo. You can test this by trying to save a new file with non-ascii characters in filename, like "ä,ö,é" etc. Any chances that this would be included in some "fixes" cws? This patch applies against m158 (with a small offset). I have built and successfully run OOo with these changes.
Actually I would prefer to fix the solution from the previous effort (which seems not to work for you) instead of applying the original patch. The problem I see with this patch is that it will break the API in the valid (but unused) case that someone sets a different thread locale for some reasons. On which OS version (Panther or Tiger) are you seeing this problem and in which Locale ?
obr: does the testcase I've described above work for you? As the patch says, these fixes apply ONLY to filesystem operations, not to other thread encoding stuff. Mac has always UTF8 as the filesystem encoding, so there are no exceptions. I'm using Tiger 10.4.5 with (default) english locale. The "locale" command in tcsh prints: LANG= LC_COLLATE="C" LC_CTYPE="C" LC_MESSAGES="C" LC_MONETARY="C" LC_NUMERIC="C" LC_TIME="C" LC_ALL="C" So it's not that useful. I don't mind if there is an alternative way to fix this, but I do want to get this fixed sooner than later. There are about 25 uses of osl_getThreadTextEncoding() in sal/osl/unx/, and about 23 of them need the UTF8 specified, to get this work. Alternative approaches would be to change all the instances individually, or to create a osl_getThreadTextEncoding_for_filesystem() that would return the UTF8 for MACOSX and osl_getThreadTextEncoding() for others. But I don't think those approaches are any better...
mox: I haven't checked myself, but it was reported to work in de, zh and some other locales. If you scan through sal/systools/macxp_extras/x11osx/osxlocale.cxx, you'll find that getProcessLocale always appends a '.UTF-8', which should cause osl_getThreadTextEncoding() to return RTL_TEXT_ENCODING_UTF8 by default. However, I can imagine that C.UTF-8 causes the code to fail - but I will verify.
mox: I can confirm your findings for german and english system settings. Seems that something got broken or missing. BTW: terminal always reports 'C' as locale - even with german system settings.
The real problem seems to be that the setlocale call in vcl/unx/source/app/i18n_im.cxx fails, which results in a osl_setThreadTextEncoding(RTL_TEXTENCODING_ISO8859); From there on, osl_getThreadTextEncoding() no longer returns UTF-8.
mox: what is your "Region" setting (in the Formats Tab of the International Preferences) ? Mine was "Germany (English)", which resulted in a locale of en_DE (that does not exist) and was the reason why setlocale failed. After changing this to Germany (German) or US (English), the problem went away. Even worse, I am unable to reproduce the original configuration in the International preferences.
Created attachment 34758 [details] The patch disables osl_setThreadTextEncoding for MacOSX
mox: can you please test if my patch works for you or if we need to append a ".UTF-8" to the SetSystemLocale parameter below. I still wasn't able to reproduce the original setting :(. Thanks.
My Regional settings are "custom". They would be normally by Finnish (i.e. en_FI??), but I don't agree with Mac's way of showing the clock times for Finnish region :) How about changing the default encoding to UTF8 for locales that are not recognized in any sane way? While I could change settings to comply with "correct" regions, it is not possible to require this for all users: Mac settings UI allow this customization anyway. ... Will look at your patch now...
The default encoding on MacOS X for OOo is already UTF-8 (the code in sal actually returned en_DE.UTF-8 for my locale) - we only need to fix this fallback in vcl not to change it to something different.
Obr: yes the patch fixes my problems in all places except in the X11 window title. The window title still shows the non-UTF8 (i.e. doubled non-ASCII) characters. The window title is not critical thing, but it would be nice to have the title show up properly. Your patch looks elegant, but I would still feel more comfortable with a patch like this: + #ifdef MACOSX // MacOS X always uses UTF-8 for the filesystem + osl_setThreadTextEncoding (RTL_TEXTENCODING_UTF8); + #else osl_setThreadTextEncoding (RTL_TEXTENCODING_ISO_8859_1); + #endif ... (while fiddling with the international settings, I managed to make the problem go a way for a while (using english lang, custom Finnish region), but now I managed to cause the problem to show up again... For testing purposes, doing e.g. Finnish language, english->India region, does the trick :)
Created attachment 34798 [details] New patch to fix setlocale("") fallback on OSX
Due to your findings, I was able to reproduce the problem again - the attached patch fixes both (the I/O and the window title problem) for me. Could you please verify ?
obr: Yes, your patch fixes all the problems in my build. So the patch is verified. Thanks a lot, you're my hero of the day :)
Thanks. :) Patch commited in CWS pj51.
set target.
*** Issue 62908 has been marked as a duplicate of this issue. ***
Did a build of OpenOffice 2.0.2 rc 4 with this code. Build finished without errors. I have not tested UTF-8 documents/code. James
Created attachment 35113 [details] A test document with non-ascii characters in the filename
How to test: Download the test document to your computer. Try opening it with OOo's open file -dialog. The open dialog should display the filename the same way it is seen from Finder. Also the OOo window title should have the same filename. (not as in the attached screenshot). Lastly, try saving the document with a slightly altered filename. If it works, then everything's ok.
Verified with test file from Mox. James
OK in m163.
*** Issue 65500 has been marked as a duplicate of this issue. ***