|
[
Permlink
| « Hide
]
Ross Gardler added a comment - 28/Mar/04 01:22 PM
I already did this once but then deleted all my files (doh!). Will need to redo it again soon in order to regenerate the site.
FOP has the possibility of doing text output too. If we use that it would be easy to keep it in synch with DTD changes, as we need only to change the fo conversion.
I should have checked that. I just finished re-writing the XDoc to Text stylesheet. I'll convert to FO when I can. In the meantime I have to get working with SVN.
I think that this issue is resolved. Can anyone closed?
Using SVN HEAD (is HEAD still correct for SVN)
I just tried getting a index.txt with no luck. :-( you are right, I do know why I have got that impression.
I guess that the proper nam for HEAD is trunk, but everyone will understand that anyway I do have a semi-working style sheet (table formatting is bad). I'll attach to this bug. I intend to implement it and fix the remaining bugs but right now time is not on my side. Feel free to finish off what I started, otherwise I'll come back to this soon(ish).
(since this is a new feature, I'm removing it from 0.6 target)
Using the FOPSerializer to text ends up being pretty ugly. http://xml.apache.org/fop/output.html#txt suggests some improvements, but I haven't tried yet because it'll require significant modifications to document2fo.xsl here's a simple patch to enable .txt rendering via FOPSerializer if anybody wants to try and make it look better. without improvement, it's not worth using this method.
If we put this as-is in 0.7, eventually it may be a seed for someone to make it better.
This patch is what I have working minus some code I took from elsewhere that simply gave me a string consisting of a character repeated x times. I used it to manage layout, draw underlines on titles etc.
In this patch I've stripped this code (as who actually owns it is unclear at this time) and inserted fixme's in all the places I used it. The resulting stylesheet is useable, but needs something to highlight headings etc. There are still some things need ironing out, the ones I am aware of are listed below, some of these are easy to fix (numbered lists) others less so (table layout): - numbered lists aren't numbered - lists within lists don't work - table layout is not even attempted - there is no neat wrapping of long lines of text - headings are no longer emphasised (need the script mentioned above) Be warned I have not had the time to test this in a wide range of documents, it functions for the few pages I need tet output on. Please have a go at improving things. I would like to point out an alternate solution, that seems to me is much easier to implement, and works pretty well. This is also what is used by most Docbook DSSL/XSL stylesheets.
What we can do is first render a "clean" HTML version of the page. This should be pretty easy, since the HTML conversion infrastructure is already in place. (just remove the menu and that tabs basically) After this, we just run it through a text browser (like lynx, w3m, or elinks) and take a text dump. This way, all the formatting issues -- lists, lists inside lists, headings, borders, tables, images -- all of it is taken care of automatically, and we don't need to reinvent the wheel doing that. I've done a LOT of text outputting with docbook and this method seems to work perfectly. IMFO, formatting directly to text might be more difficult, and perhaps redundant given that the text based browsers can already do a really good job. I have used the text browser solution, when I need one or two files.
However, I want solution that will easily handle all the documents in a site and one that will also work for a dynamic forrest install. See some email discussion: http://marc.theaimsgroup.com/?t=107512563400001
Rick has built a the org.spache.forrest.plugin.text-output plugin for this.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||