Issue 61958 - Fix UTF8 filaccess (and filename display) problems on Mac OS X
Summary: Fix UTF8 filaccess (and filename display) problems on Mac OS X
Status: CLOSED FIXED
Alias: None
Product: porting
Classification: Code
Component: MacOSX (show other issues)
Version: 680
Hardware: Mac Mac OS X, all
: P3 Trivial with 2 votes (vote)
Target Milestone: OOo 2.0.3
Assignee: nospam4obr
QA Contact: issues@porting
URL: http://porting.openoffice.org/servlet...
Keywords:
: 62908 65500 (view as issue list)
Depends on:
Blocks: 67331
  Show dependency tree
 
Reported: 2006-02-12 19:06 UTC by moxfox
Modified: 2006-07-13 16:02 UTC (History)
4 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
Screenshot of broken UTF8 chars in OOo (23.78 KB, image/png)
2006-02-12 19:10 UTC, moxfox
no flags Details
Updated Patrick patch that fixes the access and display problems of non-ascii file and pathnames (5.13 KB, patch)
2006-03-09 07:29 UTC, moxfox
no flags Details | Diff
The patch disables osl_setThreadTextEncoding for MacOSX (660 bytes, patch)
2006-03-11 20:17 UTC, nospam4obr
no flags Details | Diff
New patch to fix setlocale("") fallback on OSX (819 bytes, patch)
2006-03-12 17:55 UTC, nospam4obr
no flags Details | Diff
A test document with non-ascii characters in the filename (2.72 KB, patch)
2006-03-22 07:06 UTC, moxfox
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this issue.
Description moxfox 2006-02-12 19:06:16 UTC
There have been some effort to fix UTF8 problems on Mac OS X with regard to
reading, writing files and directories with UTF8 characters in their names.
However, there are still problems in m156 (Maho's build) of OpenOffice.org X11
for Mac OS X.

1) The filenames are not displayed correctly (two weird characters instead of
one correct UTF8 character). See screenshot below

2) Saving a document with a new filename that contains UTF8 characters is not
possible, because OOo does not accept the filename as a valid name.

...

The characters used in testing are the scandinavian a and o with dots: ä and ö.
Comment 1 moxfox 2006-02-12 19:08:02 UTC
The web url above points to Patrick Luby's donated code that may help in fixing
this issue.
Comment 2 moxfox 2006-02-12 19:10:23 UTC
Created attachment 34097 [details]
Screenshot of broken UTF8 chars in OOo
Comment 3 moxfox 2006-03-09 07:29:43 UTC
Created attachment 34692 [details]
Updated Patrick patch that fixes the access and display problems of non-ascii file and pathnames
Comment 4 moxfox 2006-03-09 07:33:39 UTC
The attached patch fixes the issues with non-ascii filenames/paths in OOo.

You can test this by trying to save a new file with non-ascii characters in
filename, like "ä,ö,é" etc. 

Any chances that this would be included in some "fixes" cws?

This patch applies against m158 (with a small offset). I have built and
successfully run OOo with these changes.
Comment 5 nospam4obr 2006-03-10 07:20:42 UTC
Actually I would prefer to fix the solution from the previous effort (which
seems not to work for you) instead of applying the original patch.

The problem I see with this patch is that it will break the API in the valid
(but unused) case that someone sets a different thread locale for some reasons.

On which OS version (Panther or Tiger) are you seeing this problem and in which
Locale ?
Comment 6 moxfox 2006-03-10 14:35:34 UTC
obr: does the testcase I've described above work for you?

As the patch says, these fixes apply ONLY to filesystem operations, not to other
thread encoding stuff. Mac has always UTF8 as the filesystem encoding, so there
are no exceptions.

I'm using Tiger 10.4.5 with (default) english locale. The "locale" command in
tcsh prints:
LANG=
LC_COLLATE="C"
LC_CTYPE="C"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL="C"

So it's not that useful.

I don't mind if there is an alternative way to fix this, but I do want to get
this fixed sooner than later.

There are about 25 uses of osl_getThreadTextEncoding() in sal/osl/unx/, and
about 23 of them need the UTF8 specified, to get this work. 

Alternative approaches would be to change all the instances individually, or to
create a osl_getThreadTextEncoding_for_filesystem() that would return the UTF8
for MACOSX and osl_getThreadTextEncoding() for others.

But I don't think those approaches are any better...
Comment 7 nospam4obr 2006-03-10 15:55:54 UTC
mox: I haven't checked myself, but it was reported to work in de, zh and some
other locales.

If you scan through sal/systools/macxp_extras/x11osx/osxlocale.cxx, you'll find
that getProcessLocale always appends a '.UTF-8', which should cause
osl_getThreadTextEncoding() to return RTL_TEXT_ENCODING_UTF8 by default.

However, I can imagine that C.UTF-8 causes the code to fail - but I will verify.
Comment 8 nospam4obr 2006-03-10 19:38:43 UTC
mox: I can confirm your findings for german and english system settings. Seems
that something got broken or missing.

BTW: terminal always reports 'C' as locale - even with german system settings.
Comment 9 nospam4obr 2006-03-10 21:32:04 UTC
The real problem seems to be that the setlocale call in
vcl/unx/source/app/i18n_im.cxx fails, which results in a
osl_setThreadTextEncoding(RTL_TEXTENCODING_ISO8859);

From there on, osl_getThreadTextEncoding() no longer returns UTF-8.
Comment 10 nospam4obr 2006-03-11 19:57:53 UTC
mox: what is your "Region" setting (in the Formats Tab of the International
Preferences) ? Mine was "Germany (English)", which resulted in a locale of en_DE
(that does not exist) and was the reason why setlocale failed.

After changing this to Germany (German) or US (English), the problem went away.
Even worse, I am unable to reproduce the original configuration in the
International preferences.
Comment 11 nospam4obr 2006-03-11 20:17:47 UTC
Created attachment 34758 [details]
The patch disables osl_setThreadTextEncoding for MacOSX
Comment 12 nospam4obr 2006-03-11 20:22:32 UTC
mox: can you please test if my patch works for you or if we need to append a
".UTF-8" to the SetSystemLocale parameter below. I still wasn't able to
reproduce the original setting :(. Thanks.
Comment 13 moxfox 2006-03-11 21:59:58 UTC
My Regional settings are "custom". They would be normally by Finnish (i.e.
en_FI??), but I don't agree with Mac's way of showing the clock times for
Finnish region :)

How about changing the default encoding to UTF8 for locales that are not
recognized in any sane way?

While I could change settings to comply with "correct" regions, it is not
possible to require this for all users: Mac settings UI allow this customization
anyway.
...
Will look at your patch now...
Comment 14 nospam4obr 2006-03-11 22:31:49 UTC
The default encoding on MacOS X for OOo is already UTF-8 (the code in sal
actually returned en_DE.UTF-8 for my locale) - we only need to fix this fallback
in vcl not to change it to something different.
Comment 15 moxfox 2006-03-11 22:51:14 UTC
Obr: yes the patch fixes my problems in all places except in the X11 window
title. The window title still shows the non-UTF8 (i.e. doubled non-ASCII)
characters. The window title is not critical thing, but it would be nice to have
the title show up properly.

Your patch looks elegant, but I would still feel more comfortable with a patch
like this:
+ #ifdef MACOSX // MacOS X always uses UTF-8 for the filesystem
+     osl_setThreadTextEncoding (RTL_TEXTENCODING_UTF8);
+  #else 
      osl_setThreadTextEncoding (RTL_TEXTENCODING_ISO_8859_1);
+ #endif 

...
(while fiddling with the international settings, I managed to make the problem
go a way for a while (using english lang, custom Finnish region), but now I
managed to cause the problem to show up again... For testing purposes, doing
e.g. Finnish language, english->India region, does the trick :)
Comment 16 nospam4obr 2006-03-12 17:55:03 UTC
Created attachment 34798 [details]
New patch to fix setlocale("") fallback on OSX
Comment 17 nospam4obr 2006-03-12 17:59:40 UTC
Due to your findings, I was able to reproduce the problem again - the attached
patch fixes both (the I/O and the window title problem) for me. Could you please
verify ?
Comment 18 moxfox 2006-03-12 18:55:15 UTC
obr: Yes, your patch fixes all the problems in my build. So the patch is verified.

Thanks a lot, you're my hero of the day :)
Comment 19 nospam4obr 2006-03-12 21:29:36 UTC
Thanks. :) Patch commited in CWS pj51.
Comment 20 nospam4obr 2006-03-13 11:06:23 UTC
set target.
Comment 21 b.osi.ooo 2006-03-15 10:37:14 UTC
*** Issue 62908 has been marked as a duplicate of this issue. ***
Comment 22 jjmckenzie 2006-03-20 21:24:51 UTC
Did a build of OpenOffice 2.0.2 rc 4 with this code. Build finished without 
errors.  I have not tested UTF-8 documents/code.

James
Comment 23 moxfox 2006-03-22 07:06:58 UTC
Created attachment 35113 [details]
A test document with non-ascii characters in the filename
Comment 24 moxfox 2006-03-22 07:10:43 UTC
How to test:

Download the test document to your computer. Try opening it with OOo's open file
-dialog. The open dialog should display the filename the same way it is seen
from Finder. Also the OOo window title should have the same filename. (not as in
the attached screenshot).

Lastly, try saving the document with a slightly altered filename. If it works,
then everything's ok.
Comment 25 jjmckenzie 2006-03-23 02:37:58 UTC
Verified with test file from Mox.

James
Comment 26 nospam4obr 2006-04-27 21:01:56 UTC
OK in m163.
Comment 27 nospam4obr 2006-05-18 05:47:07 UTC
*** Issue 65500 has been marked as a duplicate of this issue. ***