Issue 93433 - build breaks in libxml2 on Korean Windows due to special character
build breaks in libxml2 on Korean Windows due to special character
Status: RESOLVED FIXED
Product: Build Tools
Classification: Code
Component: code
OOO300m4
PC Windows Vista
: P3 trivial (vote)
: 4.0.0
Assigned To: zhang jianfang
issues@tools
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-09-03 16:15 UTC by jeongkyu.kim
Modified: 2013-07-11 08:27 UTC (History)
6 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation on: ---
Developer Difficulty: ---


Attachments
An experimental patch (2.27 KB, patch)
2009-02-19 16:01 UTC, tora3
no flags Details | Diff
runtest.c patch (481 bytes, patch)
2012-05-29 06:44 UTC, zhang jianfang
no flags Details | Diff
testapi.c patch (868 bytes, patch)
2012-05-29 06:45 UTC, zhang jianfang
no flags Details | Diff
makefile patch (382 bytes, patch)
2012-05-29 06:46 UTC, zhang jianfang
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this issue.
Description jeongkyu.kim 2008-09-03 16:15:07 UTC
The following error occurred in libxml2 while I was building OOO300_m4.

----------------------------------------
testapi.c
..\testapi.c : warning C4819: The file contains a character that cannot be repre
sented in the current code page (949). Save the file in Unicode format to 
preven
t data loss
..\testapi.c(294) : error C2001: newline in constant
..\testapi.c(295) : error C2143: syntax error : missing ')' before 'return'
NMAKE : fatal error U1077: 'c:\PROGRA~1\MICROS~1.0\VC\bin\cl.exe' : 
return code
'0x2'
Stop.
----------------------------------------

And I found the problematic code line includes special character which is not 
correctly translated on Korean Windows. I guess this applies to Chinese and 
Japanese Windows too.

static xmlChar gen_xmlChar(int no, int nr ATTRIBUTE_UNUSED) { 
if (no == 0) return('a'); 
if (no == 1) return(' '); 
if (no == 2) return((xmlChar) 'ø'); << Here is the problematic line
return(0); 
} 

A workaround for me was to convert the encoding of the file into utf8 using 
the following commands.

$ piconv -f iso-8859-1 -t utf8 ./wntmsci12.pro/misc/build/libxml2-2.6.31/testapi.c 
> testapi.c.utf8
$ cp testapi.c.utf8 ./wntmsci12.pro/misc/build/libxml2-2.6.31/testapi.c
Comment 1 zhangxiaofei.ooo 2008-09-16 02:45:17 UTC
I can confirm it on Chinese Windows.
Comment 2 tora3 2009-02-19 15:56:41 UTC
Environment:
  OS: Microsoft Windows XP Professional Version 2002 Service Pack 3 (Japanese)
  Cygwin: CYGWIN_NT-5.1
  Compiler: Microsoft (R) 32-bit C/C++ Optimizing Compiler Version
15.00.21022.08 for 80x86 (English)
  Milestone: DEV300_m41

Error messages:
========================================
Building module libxml2
...
        cl.exe /nologo /D "WIN32" /D "_WINDOWS" /D "_MBCS" /W1 /MD /I..
/I..\include /I.\include /D "_REENTRANT" /D "HAVE_WIN32_THREADS" /D_CRT_SECURE\
_NO_DEPRECATE /D_CRT_NONSTDC_NO_DEPRECATE /D "NDEBUG" /O2 /Foint.utils.msvc\ /c
..\testapi.c
testapi.c
..\testapi.c : warning C4819: The file contains a character that cannot be
represented in the current code page (932). Save the file in Unicode format\
 to prevent data loss
..\testapi.c(294) : error C2001: newline in constant
..\testapi.c(295) : error C2143: syntax error : missing ')' before 'return'
NMAKE : fatal error U1077: 'c:\PROGRA~1\MICROS~1.0\VC\bin\cl.exe' : return code
'0x2'
Stop.
dmake:  Error code 2, while making './wntmsci12.pro/misc/build/so_built_so_libxml2'

ERROR: Error 65280 occurred while making /cygdrive/o/ooo/cws/DEV300_m41/libxml2
rmdir /cygdrive/c/WINDOWS/TEMP/2712
dmake:  Error code 1, while making 'build_instsetoo_native'
========================================

Quick investigation:
In the error message, the "code page (932)" denotes Japanese.

$ cd $SRC_ROOT/libxml2/wntmsci12.pro/misc/build/libxml2-2.6.31
$ find * -name '*.c' | xargs perl -ne 'do { print "$ARGV\n"; close(ARGV) } if
m/[\x80-\xff]/'
doc/examples/testWriter.c
entities.c
runtest.c
testapi.c
xmlschemas.c

$ cd $SRC_ROOT/libxml2/unxsoli4.pro/misc/build/libxml2-2.6.31
$ perl -ne 'next if m{\A\s*/?\*}; printf "%s:%d: %s", $ARGV,$.,$_ if
m/[\x80-\xff]/; close ARGV if eof' *.c | iconv -f iso-8859-1 -t utf-8
runtest.c:2713:     "urip://example.com/résumé.html",
testapi.c:294:     if (no == 2) return((xmlChar) 'ø');
testapi.c:402:     if (no == 2) return((xmlChar *) "nøne");

There are three lines with problematic characters encoded in ISO-8859-1.

$SRC_ROOT/libxml2/wntmsci12.pro/misc/build/libxml2-2.6.31/include/libxml/xmlstring.h
/**
 * xmlChar:
 *
 * This is a basic byte in an UTF-8 encoded string.
 * It's unsigned allowing to pinpoint case where char * are assigned
 * to xmlChar * (possibly making serialization back impossible).
 */
typedef unsigned char xmlChar;

Quick solution:
  Substitute the characters with corresponding hexadecimal escape sequences.
  An experimental patch file is being attached.

References:
  C++ Character Constants
    http://msdn.microsoft.com/en-us/library/6aw8xdf2.aspx

  C++ String Literals
    http://msdn.microsoft.com/en-us/library/69ze775t.aspx
Comment 3 tora3 2009-02-19 16:01:32 UTC
Created attachment 60315 [details]
An experimental patch
Comment 4 jeongkyu.kim 2009-02-20 03:38:19 UTC
Thanks for your effort, tora! The patch works fine on Korean (MS949) windows.
Comment 5 jeongkyu.kim 2009-06-14 01:08:22 UTC
@mh: No chance to apply this patch?
Comment 6 eric.bachard 2011-12-16 09:04:32 UTC
@tora

1) Are you Apache OpenOffice commiter ?

2) if not,I could commit your fix + the mandatory changes, but I'll need your real  name to mention you are the author.

Thanks in advance
Comment 7 Pedro Giffuni 2012-05-29 02:03:38 UTC
A small suggestion, please try to use unified diff format ( -u ) for the patches. The resulting patches are usually smaller and easier to read.
Comment 8 zhang jianfang 2012-05-29 05:59:52 UTC
The patch here is a little bit out of date. it is for libxml2 2.6.31, while latest used version is 2.7.6 in AOO3.4. I will try to generate a latest version of patch.

And the libxml2 patch only helps to build English version under DBCS environment. But to build AOO DBCS version, for ex Simplified Chinese version, you still need the English build environment.
Comment 9 zhang jianfang 2012-05-29 06:44:58 UTC
Created attachment 77696 [details]
runtest.c patch
Comment 10 zhang jianfang 2012-05-29 06:45:34 UTC
Created attachment 77697 [details]
testapi.c patch
Comment 11 zhang jianfang 2012-05-29 06:46:18 UTC
Created attachment 77698 [details]
makefile patch
Comment 12 zhang jianfang 2012-05-29 06:55:35 UTC
I just simply migrate tora's code to latest AOO 3.4 code base, so it's original author is still tora. 

3 patch files, 
  libxml2-testapi.patch and libxml2-runtest.patch should be added to libxml2\ directory directly.
  makefile.patch should be applied to libxml2\makefile.mk
Comment 13 zhang jianfang 2012-05-31 02:08:32 UTC
Since several people complains on this issue, http://markmail.org/message/4ef7qvgaurduvnlt?q=93433. I will take the bug to deliver the patch to 3.4.
Comment 14 zhang jianfang 2012-05-31 02:19:58 UTC
Committed in revision r1344534 with log message,

Fix issue #93433: build breaks in libxml2 on Korean Windows due to special character

* /libmxl2/libxml2-testapi.patch : replaced '\248' encoded in ISO-8859-1 with '\xf8'
* /libmxl2/libxml2-runtest.patch : replaced 'e' encoded in ISO-8859-1 as in 'resume' with \xe9

Patch by: tora3@nichoume.com
Comment 15 hdu@apache.org 2013-07-11 08:27:37 UTC
Updated target to release that will contain the fix.