Issue Details (XML | Word | Printable)

Key: FOR-125
Type: New Feature New Feature
Status: Closed Closed
Resolution: Fixed
Priority: Minor Minor
Assignee: Rick Tessner
Reporter: Nick Chalko
Votes: 1
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Forrest

produce formated plain text output

Created: 28/Mar/04 09:51 AM   Updated: 04/Dec/04 03:21 AM
Return to search
Component/s: Core operations
Affects Version/s: 0.6
Fix Version/s: 0.7

Time Tracking:
Not Specified

File Attachments:
  Size
Zip Archive document2txt_patch.zip 2004-08-01 10:08 AM Ross Gardler 23 kB
Text File fop2txt.patch 2004-06-03 05:33 AM Dave Brondsema 3 kB


 Description  « Hide
Please provide the option of generating formated plain text output.
All request ending in .txt should generate a plain text output.

Word wrapping should default to 80 characters
It should allow for as much formating as possible
Including:
lists
indents
tables
footnotes to hyperlinks
Strong and em.





 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Ross Gardler added a comment - 28/Mar/04 01:22 PM
I already did this once but then deleted all my files (doh!). Will need to redo it again soon in order to regenerate the site.

Nicola Ken Barozzi added a comment - 05/Apr/04 09:54 PM
FOP has the possibility of doing text output too. If we use that it would be easy to keep it in synch with DTD changes, as we need only to change the fo conversion.

Ross Gardler added a comment - 05/Apr/04 11:13 PM
I should have checked that. I just finished re-writing the XDoc to Text stylesheet. I'll convert to FO when I can. In the meantime I have to get working with SVN.

Juan Jose Pablos added a comment - 27/Apr/04 06:32 AM
I think that this issue is resolved. Can anyone closed?

Nick Chalko added a comment - 27/Apr/04 11:43 AM
Using SVN HEAD (is HEAD still correct for SVN)
I just tried getting a index.txt with no luck.

Juan Jose Pablos added a comment - 27/Apr/04 06:05 PM
:-( you are right, I do know why I have got that impression.

I guess that the proper nam for HEAD is trunk, but everyone will understand that anyway

Ross Gardler added a comment - 28/Apr/04 02:51 AM
I do have a semi-working style sheet (table formatting is bad). I'll attach to this bug. I intend to implement it and fix the remaining bugs but right now time is not on my side. Feel free to finish off what I started, otherwise I'll come back to this soon(ish).

Dave Brondsema added a comment - 03/Jun/04 05:30 AM
(since this is a new feature, I'm removing it from 0.6 target)

Using the FOPSerializer to text ends up being pretty ugly. http://xml.apache.org/fop/output.html#txt suggests some improvements, but I haven't tried yet because it'll require significant modifications to document2fo.xsl

Dave Brondsema added a comment - 03/Jun/04 05:33 AM
here's a simple patch to enable .txt rendering via FOPSerializer if anybody wants to try and make it look better. without improvement, it's not worth using this method.

Nicola Ken Barozzi added a comment - 17/Jun/04 07:12 AM
If we put this as-is in 0.7, eventually it may be a seed for someone to make it better.

Ross Gardler added a comment - 01/Aug/04 10:08 AM
This patch is what I have working minus some code I took from elsewhere that simply gave me a string consisting of a character repeated x times. I used it to manage layout, draw underlines on titles etc.

In this patch I've stripped this code (as who actually owns it is unclear at this time) and inserted fixme's in all the places I used it. The resulting stylesheet is useable, but needs something to highlight headings etc.

There are still some things need ironing out, the ones I am aware of are listed below, some of these are easy to fix (numbered lists) others less so (table layout):

- numbered lists aren't numbered
- lists within lists don't work
- table layout is not even attempted
- there is no neat wrapping of long lines of text
- headings are no longer emphasised (need the script mentioned above)

Be warned I have not had the time to test this in a wide range of documents, it functions for the few pages I need tet output on. Please have a go at improving things.

Diwaker Gupta added a comment - 01/Aug/04 01:53 PM
I would like to point out an alternate solution, that seems to me is much easier to implement, and works pretty well. This is also what is used by most Docbook DSSL/XSL stylesheets.

What we can do is first render a "clean" HTML version of the page. This should be pretty easy, since the HTML conversion infrastructure is already in place. (just remove the menu and that tabs basically)

After this, we just run it through a text browser (like lynx, w3m, or elinks) and take a text dump. This way, all the formatting issues -- lists, lists inside lists, headings, borders, tables, images -- all of it is taken care of automatically, and we don't need to reinvent the wheel doing that.

I've done a LOT of text outputting with docbook and this method seems to work perfectly. IMFO, formatting directly to text might be more difficult, and perhaps redundant given that the text based browsers can already do a really good job.

Nick Chalko added a comment - 01/Aug/04 04:07 PM
I have used the text browser solution, when I need one or two files.
However, I want solution that will easily handle all the documents in a site and one that will also work for a dynamic forrest install.

David Crossley added a comment - 02/Aug/04 12:05 PM

Ross Gardler added a comment - 04/Dec/04 03:21 AM
Rick has built a the org.spache.forrest.plugin.text-output plugin for this.