Apache OpenOffice (AOO) Bugzilla – Issue 48409
[MWEx] Export to MediaWiki
Last modified: 2013-08-07 14:42:39 UTC
While it's possible to save documents to html, rtf, plain text, it would be anice convenience to be able to save to one (maybe more) Wiki format - e.g. the format used by Wikipedia, or Twiki. It would be a great way to get lots of information currently locked up in Word files into a really open and collaborative format. Should be quite simple, too - easier than HTML export, I reckon. luke
This is of course not a defect. Flagged to "Enhancement".
I have written a Writer to DokuWiki macro. It is available from: http://homepages.paradise.net.nz/hillview/OOo/ As the macro is LGPL anyone is free to modify it for any other wiki engine format. Sometime soon I hope to make a MediaWiki version.
*** Issue 61131 has been marked as a duplicate of this issue. ***
At work, we've moved exclusively to use OpenOffice. And we're seeing a huge growth in both our internal and external wiki. Having an export from OOo would make the transition from documents to online collaboration that much easier, especially for the techno-phobics. We have 6000 internal and 14,000 external users of our MediaWiki.
Created attachment 33558 [details] Bash script plus XSLT providing basic OpenDocument to MediaWiki conversion.
Word Processor should be able to export formated documents to wiki style source code with appropriate syntax. WYSIWYG is needed for wiki entry creation. It should be in the list under "Save as..." Choose "wiki" and then under a sub menu choose the wiki program according to common wiki software as listed at http://en.wikipedia.org/wiki/List_of_wiki_software This will generate a plain text file but formated with wiki style formating tags. Then it can be copy & pasted into a wiki entry. Another desired feature (of lessor importance) is the ability to import wiki source code into the word processor. This is of lessor importance because you can copy & paste the html generated by a wiki source code into the word processor already. This is still a desirable feature so that there isn't an extra reformating step to html that has the opportunity to introduce formatting errors. This feature is necessary as there is no WYSIWYG way of creating or editing an entry in any wiki. Currently it is too difficult to enter a 100+ page formated document into a wiki and create one or more wiki entries.
*** Issue 61572 has been marked as a duplicate of this issue. ***
added fma to CC-list
I think the best way for this to be implemented is with an XSL transformation (like "haui" posted above), along the lines of the DocBook XML support found at http://xml.openoffice.org/xmerge/docbook/ Specifically, there is an import and an export transformation between the wiki syntax and the OpenOffice.org format. In addition, the use has to run a template (as in stw) that will limit the available styles of the document to those that have relevance to the wikipedia syntax. In this way, it will be easy to import/export. Once having the import/export support as described in the http://xml.openoffice.org/xmerge/docbook/ page, you can easily add, let's say, the .wiki format in the supported formats that appear in File/Open/
Created attachment 33964 [details] Style template to use with the already attached odt2txt transformation.
As simos proposes, my odt2txt transformation in deed relies on some "special" (character) styles to generate special MediaWiki markup. Currently supported is the math environment and all sorts of links and images. To insert a wiki link in your text, you just type the target name say "OpenOffice.org" and apply the character style "WikiLink" to it. In the transformed version of the text, this is converted to "[[OpenOffice.org]]". The same applies to mathematical formulas. If you want to insert the formula "E = m C²" into your text, just type it the wiki-way and apply the character style "WikiMath" to it: "E = m C^2". During export, this is converted to "<math>E = m C^2</math>". Creating more complex wiki links and references to images is also supported. The text "my favorite office suite [OpenOffice.org]" (with character style WikiLink) is transformed to [[OpenOffice.org|my favorite office suite]] and therefore refers to the same wiki page as the simple link above, but is displayed as "my favorite office suite" in the text. You now can imagine, how to wiki-include images. The attached style template "oopedia-style.ott" also assigns two keyboard shortcuts to the character styles, namely Ctrl-W to WikiLink and Ctrl-M to WikiMath. I'll also attach two example files in OpenDocument format that show the conversion at work. The first one "oopedia-example.odt" is a minimal example that exactly demonstrates the link and math styles as described above. The second one "rytzsche-achsenkonstruktion-herleitung.odt" is a real world example showing "the source of" an article I wrote for de.wikipedia.org using OpenOffice.org and the attached transformation. Yes, I would also like an Wikipedia-auto-upload-on-save mechanism, a WikiMath-to-native-OpenOffice.org-formula-conversion-(back-and-forth), WYSIWIG-wiki-image-display with WikiMedia-Commons-auto-image-upload-on-save, support for native OpenOffice.org styles for bold and italics, and of cause an MediaWiki-import-mechanism. None of them are available yet, dreaming is still allowed.
Created attachment 33967 [details] Simple example document demonstrating wiki links and wiki math in OpenOffice.org.
Created attachment 33968 [details] Complex example for a real world wiki page using OpenOffice.org wiki syntax (source of de.wikipedia.org/wiki/Rytzsche_Achsenkonstruktion).
As a very non-technical OOo user, it would help me and others in a similar situation if there were step by step instructions on how to use these solutions.
I might add the following refinement/request to support the usage pattern of opening legacy format documents and pushing them to a MediaWiki... In addition to the menu item File|Export as PDF..., we need File|Export to Wiki..., (or Save to Wiki...) which 1) provides a list of wikis previously used as targets by this user (and a facility to specify another one), and 2) provides a mechanism of specifying both an article and (optionally) a section. For section selection, I suggest a tree view, most straightforwardly implemented by hacking the header markup on the target page. But there may be a better way. Also, we need File|Open from Wiki..., providing a mechanism for specifying both an article and (optionally) a section. It would be preferable to use MediaWiki's "Show Preview" button to perform WYSIWYG refreshes, so that templates and whatnot will be expanded in the context of the target wiki. For extra credit, have an option to show the Edit History of the wiki page as Recorded Changes and/or extend the Edit|Compare Document... to take a historical edit of the page as the "other" document.
For all of you who are not fans of Unix command line tools, a page has been set up that demonstrates the odt2txt conversion utility online: http://www.ipd.uni-karlsruhe.de/~hauma/odt2txt/
Hello! Are we interested to integrate it somehow into OpenOffice.org? I might able to help with installation related things.
Hello! Are there any updated version of this filter? I have created an integration patchset for this filter. So if you wish you can integrate it. I didn't tested it (with build) - but I will do it later.
Created attachment 44964 [details] Mediawiki (export) integration patch rev. 1
set issue type to patch and reassign to project lead for review.
Thanks for the patch; I'm currently a "little" bit busy but will try to review the patch and do some testing as soon as possible. I also will assign a target as soon as I will know more about it.
@kami: did I miss to apply one patch: I get message: Making: ../wntmsci11.pro/bin/osl/setup_osl.inf guw.exe /usr/bin/perl /home/martin/OpenOffice-build/OOF680_m14/solenv/bin/par2sc ript.pl -i ../wntmsci11.pro/par,/home/martin/OpenOffice-build/OOF680_m14/solver/ 680/wntmsci11.pro/par @@/tmp/mkrUGyED -o ../wntmsci11.pro/bin/osl/setup_osl.inf par2script -i ..\wntmsci11.pro\par,C:\cygwin\home\martin\OpenOffice-build\OOF680 _m14\solver\680\wntmsci11.pro\par @@C:\cygwin\tmp\mkrUGyED -o ..\wntmsci11.pro\b in\osl\setup_osl.inf Checking directories ... Done ERROR: No directory defined for file: gid_File_Stw_MediaWikiTemplate! Checking File ... dmake: Error code 255, while making '../wntmsci11.pro/bin/osl /setup_osl.inf' ---* tg_merge.mk *---
I am not sure If you applied mediawiki-scp2-directory.diff patch. Please check for "gid_Dir_Share_Xslt_Wiki" at scp2/source/ooo/directory_ooo.scp if you found - you are right and I did something wrong elso please patch this file :o)
>I am not sure If you applied mediawiki-scp2-directory.diff patch. it has been applied. >Please check for "gid_Dir_Share_Xslt_Wiki" at scp2/source/ooo/directory_ooo.scp >if you found - you are right and I did something wrong elso please patch this >file :o) yes, it has been found.
I am so sorry mh. I missed two semmicolon from mediawiki-scp2-xsltfilter-file.diff. Those caused the problem. I am going to attach it now. I am only posting the affected diff file. Sorry again!
Created attachment 45149 [details] Corrected SCP2 file for file handling
@kami: no problem, I should have seen the missing semikolon myself. Now it compiles like a charme but crashes if I try to export in this format.
The crash is caused by the fact that the user data contains 5 elements but the adaptor code expects 6. Of course the adaptor shouldn't crash (easy to fix) but probably the code won't work correctly. I'm currently trying.
Created attachment 45157 [details] the corrected filter configuration
Here's the obvious meaning of the userdata elements: userdata[0] ConvertClass userdata[2] Import userdata[3] Export userdata[4] ImportStyleSheet userdata[5] ExportStyleSheet userdata[6] PrettyPrinting (optional) So I added a comma before the export style sheet. The crash is gone but I only get an empty file. Obviously the still filter needs some work.
Just for learning: Was the problem here: <prop oor:name="UserData"><value oor:separator=",">com.sun.star.documentconversion.XSLTFilter,,com.sun.star.comp.Writer.XMLImporter,com.sun.star.comp.Writer.XMLExporter,,../share/xslt/wiki/odt2mediawiki.xsl</value></prop> ? KAMI
Svante, the filter does not produce any output though it doesn't throw an exception or returns an error. Can you find out why? And BTW: shouldn't we change the filteradaptor code so that it complains if msUserData has less than 5 elements? Currently it crashes in this case.
@KAMI: yes, please see the attached xcu file
Thanks, I saw :o)
I specified the xsl file wrongly. Here is the updated file.
Created attachment 45180 [details] Corrected typedetection file
this latest file does not solve the problem.
Just for learning: what is the difference between com.sun.star.comp.Writer.XMLImporter and com.sun.star.comp.Writer.XMLOasisImporter ?
Also we might try this: http://www.activasoft.com/OpenOffice2MediaWiki Has anyone experience how this filter perform on complex documents?
assignment got confused...
Fine, now I get a file. AFAIK "XMLOasisExporter" exports to ODF, "XMLExporter" exports to the old OOo file format. I will check how good the export is.
Well, a first test didn't take very long. And the results are not very attractive. I took a simple document that consists of text, lists, some hyperlinks and a table. Result of the conversion: many of the list didn't make it, the table is gone and all(!) hyperlinks. Also non-ascii characters (like custom quotes) didn't convert correctly. That was what I spotted in two minutes of review. All in all the transformation doesn't look usable to me. Especially losing all hyperlinks is a disaster. I don't know if anyone is willing to fix the transformation. In case not we should set the issue type back to "FEATURE".
Svante, it seems that we don't need your help. Sorry for bothering. :-)
Created attachment 45200 [details] Improved transformation revision 339.
Just to increase confusion: I updated my transformation (odt2txt). It is now able to transform paragraphs, headings, bold and italics character styles, native OpenDocument links, and ordered and unordered lists. Additionally, it respects the special styles WikiMath and WikiLink. Tables are currently not supported.
Created attachment 45201 [details] Fixed disappearing of document internal references.
Created attachment 45202 [details] Revision 342 adding basic support for tables.
Wow! I didn't expect to see improvements happening so fast. :-) I will give it a try soon. Thanks for your work.
Yes it is much better. Are you planning to improve it more? I created a Crazy Wiki Page (http://wiki.services.openoffice.org/wiki/Crazy_wiki_test) where we can test the filter capability... I am attaching the odt file here KAMI
Created attachment 45213 [details] An crazy example
Crazy example works well but not perfect. * Tables with header - the header row lost * Underlined/Smallcaps/striketrough text not supported * Preformatted text not supported * Centered/Right/Justified paragraph not supported * Mo horizontal rules * Footnotes not supported * Images not supported * No border around the table Bit things better than before :o) Thank you...
Yes, the transformation is much better now. It sometimes adds some brackets around text that irritate me. And it still has problems with non-ascii characters. I will upload an example document showing this later.
We should create a consistent testing document, where all (or lofs of) features can be tested: Features / special characters.
Created attachment 45214 [details] test document
You can see the result of the export of the attached test document at http://wiki.services.openoffice.org/wiki/Mbatest
Hi, I'm willing to further improve the transformation. Test documents are a really great help, thanks.
Great! Looking forward to the improvements. We should define which of the known issues or missing features we might need for an integration as a filter. My choice is: - fix problems with non-ASCII characters - remove superfluous brackets around text - paragraph alignment should be supported As a developer I also would like to have a way to export source code properly, could be as preformatted text. But I'm not sure if this is something we need for integration. Images surely are a problem as this can't be solved by a filter producing a file. A tedious manual procedure won't help a lot so perhaps we should leave images out for the moment.
Created attachment 45498 [details] Revisoin 2639 of the odt2wiki transformation (formerly known as odt2txt).
Created attachment 45499 [details] Feature overview of the odt2wiki transformation.
Hi there, I'm happy to present an extremely improved version (r2639) of the transformation. The attached odt2wiki-features.odt document gives an overview of the implemented features. The tranformation result of the features document can be viewed online at http://wiki.services.openoffice.org/wiki/Odt2Wiki/Features. The transformation regarding character styles has been completely rewritten. Special care has been taken to keep the runtime complexity of the transformation linear in the document size (assuming a small fixed number of formattings per paragraph). On a medium range PC, the Apache Xalan engine is still able to output 8 pages/second with even large documents (> 100 pages), resulting in a speed advantage of factor 68 over the alternative transformation mentioned above on a 64 pages document (while now producing really superior results...). This enables transforming even mid-sized documents in an environment-friendly way. :-)
Regarding the problem with "special characters" mentioned above: The transformation always creates UTF-8 output, which might be non-standard on some platforms. In that case, it helps viewing the result in e.g. Firefox and switching the encoding to UTF-8 before cutting and pasting the text.
Thanks for the work, I'll give it a try on the weekend.
Sorry, I still couldn't get a running filter. Maybe I'm confused. Is the final xslt file (revision 2639) thought to be incorporated into the patch from KAMI or should it be installed via "Tools - XML Filter Settings"?
Created attachment 45737 [details] Mediawiki (export) integration patch rev. 2 - Updated xslt and diff files.
I just created a patchset with latest files. I will test it tomorrow.
Thanks KAMI, but you did the same error again: we must use the XMLOasisExporter. ;-) I changed that and the patch runs fine. I will check the result tomorrow.
Just a first comment: my first tests are very promising. Great work, haui! Please give me some more days for testing. If no bigger problems show up I will add your patch to a Child Workspace. Thanks for your contribution!
Thanks ;o) In relation above I would like to know why other filters (XHTML, DocBook) use XMLExporter and XMLImporter ant not the Oasis version?
If the transformation transform from or to our old OOo1.x file format ("sxw") you have to use the XMLExporter, if the transformation uses ODF you have to use the XMLOasisExporter. The wiki transformation obviously uses the latter, and it is strongly recommended to do so. Some old transformations from the time before ODF was ready are still available so the XMLExporter is still supported.
Question to haui: would it be possible to extend your transformation to embed HTML code for elements not available in Wiki format? Of course it must be defined when and what but my question is only a general one about the basic feasibility.
Regarding HTML embedding: Technically speaking, the output method of the transformation is "text", therefore, XML in the result document must be explicitly quoted by the transformation. But there is no principal restriction that makes it impossible to embedd HTML into the generated wiki text - even if this would reduce its readability.
I tried some exports and I'm impressed. This is a great addition to OpenOffice.org. Thanks for this contribution. I will add the patch to a CWS planned to be integrated into 2.3.
committed to CWS
Great, thanks all
Hi! I'm QA representative for the CWS which will introduce the "File - Export - MediaWiki (.txt)" export filter. I am currently writing an specification (see: http://specs.openoffice.org/collaterals/template/2.0/OpenOffice-org-Specification-Template.ott) about it because we MUST have a specification for i. I corrected the summary of this issue. Ciao Éric PS: though issue reports should always remain technical and facts orientated: @haui: your contribution is great!!! I'm amazed about what you've done! It's a great Writer-Wiki transformation! Thanx! :)
Thanks Èric; you are right. I removed all occurences of "WikiPedia" from all files and also registered the filter for Writer/Web. Please verify.
oops :-)
wrong button clicked. :-)
*now* please verify.
add link to spec: http://specs.openoffice.org/writer/fileIO/MediaWiki_Export_Filter.odt
seen good in cws mba32issues03 -> verified
Ok in m225
*** Issue 83645 has been marked as a duplicate of this issue. ***
hello dear other wiki filter hackers, you might want to support issue 40504: User defined Parameters for XSLT Filter why? this would make it possible to pass parameters to an xslt export filter, like this: select type of wiki [ ] mediawiki [ ] dokuwiki ...
you might also want to support good olde issue 7760: Option to force users to adhere to style sheets so we could guarantee that everything the user does is exportable to wiki...
Added keyword [MWEx] to the summary to be tracked by the query