Right now, if a log message contains characters that cannot be
represented in the client's locale, that log message will simply show
up as:
"[unconvertible log msg]"
Graceful degradation would be nice here :-).
See the dev list thread "Re: converting unconvertible UTF-8 data" for
discussion of possible solutions.
My first idea was to write a fuzzy converter function that replaces
every unconverted byte with an escape sequence representing its
numerical code ("?\XXX" or somesuch).
Then Ulrich Drepper pointed out that since this data is mainly for
human consumption, the "//TRANSLIT" behavior of glibc's iconv and GNU
libiconv would produce more readable output. We can at least detect
when we're using one of those iconv's and append that option to the
to-charset string where appropriate. (Marcus Comstedt points out that
some iconv implementations automatically do transliteration for you,
and don't even tell you whether or not it's happened, which is sort of
unnerving.)
However, if you are on a system that doesn't support this, you'll get
the result above.
So there are various non-mutually-exclusive steps to take here:
- Write the fuzzy function with the escape codes, use where
translit not available.
- Meanhwile, get Subversion doing transliteration where possible
(Ulrich may do)
- Possible early fix: make "svn log" accept --force or
--message-encoding, so one
can make it output the raw bytes or a specific encoding,
respectively.