Issue 81597 - [MWEx] Export filter "MediaWiki" fails to export Non-breaking space
Summary: [MWEx] Export filter "MediaWiki" fails to export Non-breaking space
Status: CLOSED FIXED
Alias: None
Product: Writer
Classification: Application
Component: code (show other issues)
Version: OOo 2.3 RC3
Hardware: All All
: P3 Trivial (vote)
Target Milestone: ---
Assignee: eric.savary
QA Contact: issues@sw
URL: http://specs.openoffice.org/writer/fi...
Keywords: needmoreinfo
Depends on:
Blocks:
 
Reported: 2007-09-15 08:31 UTC by norbert2
Modified: 2013-08-07 14:42 UTC (History)
3 users (show)

See Also:
Issue Type: PATCH
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
Document containing a Non-breaking space (6.33 KB, application/vnd.oasis.opendocument.text)
2007-09-15 08:33 UTC, norbert2
no flags Details
Updated transformation (revision 2711) (40.47 KB, text/xml)
2007-09-20 22:28 UTC, haui
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description norbert2 2007-09-15 08:31:26 UTC
Hi.

- Please open the attached document. It contains a Non-breaking space.
- Export it as MediwWiki
- view the exportet results in a text editor:

The exportet code is "1 2", but it should be "1 2".
Comment 1 norbert2 2007-09-15 08:33:46 UTC
Created attachment 48242 [details]
Document containing a Non-breaking space
Comment 2 michael.ruess 2007-09-16 17:07:08 UTC
Reassigned to ES.
Comment 3 haui 2007-09-17 00:01:47 UTC
To be precise, the exported wiki file contains the following contents:

> hexdump -C src/test/fixtures/nbsp.txt
00000000  31 c2 a0 32 0a                                    |1..2.|
00000005

This means that the "1" and "2" are separated by the unicode character  ,
which is exactly the non breaking space from the ODT document. This unicode
character is pretty legal for WikiMedia. Escaping all non-ASCII characters with
HTML/XML-entities makes the result hard to read (German "Umlauts", Chinese Text...).
Comment 4 eric.savary 2007-09-18 12:01:17 UTC
To sum up @haui: do you mean WONTFIX?
Comment 5 norbert2 2007-09-18 14:32:38 UTC
I have opened the output of your filter in Firefox (UTF-8) and have copied it
into our MediwWiki (UTF-8).

The space is breaking when shrinking the browser window!!! (But Umlaute are
preserved.)

If I change the space to   it is non-breaking.

So I think you should change this in your filter.
Comment 6 haui 2007-09-20 21:01:01 UTC
Ok, I can reproduce the issue. Although there is a NBSP character in the filter
output, this is either lost during cutting&pasting into the Wiki edit field or
it is swallowed by the MediaWiki software. 

I'll fix this in the filter by quoting exactly this character as   as
Norbert suggested.
Comment 7 haui 2007-09-20 22:28:05 UTC
Created attachment 48383 [details]
Updated transformation (revision 2711)
Comment 8 haui 2007-09-20 22:30:37 UTC
Besides the fix for the NBSP issue, the uploaded update contains the following
changes to the original version (revision 2639) attached to issue 48409.

------------------------------------------------------------------------
r2642 | hauma | 2007-05-30 22:41:20 +0200 (Mi, 30 Mai 2007) | 4 lines

- Allow using tabs for indentation of preformatted text.
- Prevent double newlines for separating paragraphs in preformatted text by default.
- Suppress lists for implementing section numbering.
- Made the transformation user-configurable through user info variables.
------------------------------------------------------------------------
r2708 | hauma | 2007-09-20 23:02:05 +0200 (Do, 20 Sep 2007) | 1 line

Added encoding of non-breakable spaces as HTML entity.
------------------------------------------------------------------------
r2711 | hauma | 2007-09-20 23:14:25 +0200 (Do, 20 Sep 2007) | 1 line

Bugfix: All '<' characters must be escaped during rendering a text block
containing </nowiki> markup.
------------------------------------------------------------------------
Comment 9 eric.savary 2007-09-21 14:28:27 UTC
@MAV: reassign to you. I guess we can make a CWS for this?
Comment 10 matthias.mueller-prove 2007-09-21 17:10:07 UTC
add link to original spec
add mmp to cc list

seems to me a OOo 2.3.1 PATCH/Bugfix Issue. 
Comment 11 eric.savary 2007-09-21 17:16:47 UTC
Then -> 2.3.1
Comment 12 Mathias_Bauer 2007-10-05 14:44:39 UTC
Eric, do you want to keep the 2.3.1 target though the patch contains more than
"just the fix"? As you will have to test it it should be your decision. For me
it would be OK.
haui, thanks for your great support!
Comment 13 mikhail.voytenko 2007-10-09 11:20:47 UTC
The patch will be integrated into one of the next cws.
Comment 14 mikhail.voytenko 2007-10-09 16:37:31 UTC
The patch is commited to cws fwk76.
Comment 15 mikhail.voytenko 2007-10-17 10:25:32 UTC
MAV->ES: Please verify the issue.
Comment 16 eric.savary 2007-10-23 13:43:14 UTC
@haui: please explain how to test those features:

- Suppress lists for implementing section numbering.
- Made the transformation user-configurable through user info variables.
- Bugfix: All '<' characters must be escaped during rendering a text block
containing </nowiki> markup.

What was the state before/after implementing it?
A step by step description of how to test it.

Thank you!

PS: the other fixes are working well.
Comment 17 haui 2007-10-24 21:01:15 UTC
Hi,

for "Suppress lists for implementing section numbering", transform the
OpenDocument Spec to MediaWiki with and without the patch. Without the patch,
headings are rendered 

 #
 ##
 ### '''Heading 3'''

 Text.

I was not yet able to construct a document in OOo that produces the same
internal XML structure as the OD spec.


"Made the transformation user-configurable through user info variables." is a
feature, you may want to silently ignore. Otherwise, you have to test the
following: 

1. Create a user-defined document info field (in document properties) named 
"CODE_TAB_REPLACEMENT" (instead of e.g. "Info 1"). Set the value of the field to
some string (e.g. 6 space characters). Create a paragraph with fixed-width font
and start the paragraph with tab characters. In the transformed result, tab
characters are replaced with the value of the user-defined field.

2. Create another user-defined field named "CODE_JOIN_PARAGRAPHS" and set the
value either to "true" or to "false". Create a sequence of paragraphs with
fixed-width font (e.g. a code snippet), where each line is in a separate
paragraph. If the document variable is set to "true", paragraphs are treated as
simple new-lines instead of translating to a wiki paragraph break (a double
new-line character). If the value is "false", the transformation transfroms
paragraph breaks in code section as it where normal text. 

3. Create another user-defined field named "CODE_STYLES" and enter the name of a
paragraph style that has non fixed-width font. Create a paragraph with this
style and transform the document. The paragraph should be rendered as
preformatted text with fixed-width font in the wiki output. This customization
can be used as workaround, if the font of a code paragraph is de-facto fixed
width, but this is not marked in the ODT file as such. An example is again the
OpenDocument specification.


For "Bugfix: All '<' characters must be escaped during rendering a text block
containing </nowiki> markup", tranform the following text:

  < This < is < a < paragraph & with & multiple & xml > specials. >
  < This < is < another < paragraph & with </nowiki> & </nowiki> & multiple &
xml > specials. >

Best regards
Bernhard Haumacher

Comment 18 eric.savary 2007-10-25 10:59:09 UTC
@Bernhard: Thank you for the explanations! It works as described.

@mmp: can you please include those new features to the MediaWikiExport Spec?

1) Allow using tabs for indentation of preformatted text.
Default: Tab replaced with spaces, Can be customized using CODE_TAB_REPLACEMENT
variable as info field

2) Prevent double newlines for separating paragraphs in preformatted text by
default.
Can be turned off using CODE_JOIN_PARAGRAPHS variable, value "false" as info field

3) Suppress lists for implementing section numbering.



Verified in CWS fwk76
Comment 19 eric.savary 2007-11-02 14:57:22 UTC
Ok in m235
Comment 20 norbert2 2007-11-14 18:13:32 UTC
Issue 81833 contains a newer version of the export filter (Revision 2723).
Please inculde this one into the next OOo release.