Issue Details (XML | Word | Printable)

Key: FOR-311
Type: Improvement Improvement
Status: Open Open
Priority: Minor Minor
Assignee: Unassigned
Reporter: Charles Palmer
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Forrest

OOo Headings bug causes Forrest to fail

Created: 06/Oct/04 10:25 PM   Updated: 12/Mar/07 03:41 PM
Return to search
Component/s: Plugin: input.OpenOffice.org
Affects Version/s: 0.6
Fix Version/s: None

Time Tracking:
Not Specified

File Attachments:
  Size
Zip Archive Licensed for inclusion in ASF works headings.zip 2006-10-12 06:58 AM Cyriaque Dupoirieux 12 kB
File Licensed for inclusion in ASF works openoffice-writer.sxw 2006-10-12 06:58 AM Cyriaque Dupoirieux 8 kB
File Licensed for inclusion in ASF works openoffice-writer2.sxw 2006-10-12 06:58 AM Cyriaque Dupoirieux 9 kB
XML File Licensed for inclusion in ASF works openoffice-writer2forrest.xsl 2006-10-12 06:59 AM Cyriaque Dupoirieux 12 kB
File Licensed for inclusion in ASF works openoffice-writer_clay.diff 2006-10-12 06:59 AM Cyriaque Dupoirieux 0.2 kB
File Licensed for inclusion in ASF works openoffice-writer_update.sxw 2006-10-12 06:59 AM Cyriaque Dupoirieux 10 kB
Environment:

Other Info: Patch available


 Description  « Hide
There appears to be a bug or feature in OpenOffice which affects how haedings are stored as XML. It appears that if a virgin document is opened then all headings are stored in <text:h> tags, but if an existing document is opened that lacks a particular haeding style, then if you create that heading style it is stored in <text:p> tags.

This causes Forrest to mis-interpret headings, as the Forrest OOo XSL file identifies OOo headings by looking for the <text:h> tags.

(Moved detail from Description to Comment - see 2005-12-24)

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Charles Palmer added a comment - 06/Oct/04 10:28 PM
Contains head.sxw and its matching content.xml (renamed head.xml) - these are the "virgin" files. Also contains headless.sxw and its matching content.xml (renamed headless.xml) - which is the result of deleting the original Heading 5 line and replacing it with a new Heading 5 line - now stored in a <test:p> tag.

Charles Palmer added a comment - 08/Oct/04 07:52 PM
I posted this as an issue on the OOo websitye and have just received this reply:

"Due to how headings are stored in the file format (OOo 1 + 1.1), the application doesn't know about style/heading mapping if the styles are not used. So when the last use of the 'Heading 5' style is removed from the document (and a save/load cycle is done), the mapping is lost.

The OASIS format improves on this by explicitly assigning default outline levels to styles. Hence I would think this problem should no longer occur in the new version.

dvo->es: I would consider this fixed for 2.0 (at least after CWS num0201 integration). I'm not really in a mood to still change this in the old version. Please decide how to flag this."

I don't know what they mean by "OASIS format" and "2.0" - presuambly there is a new OOo version on the way. Warning though - this may mean different use of tags in content.xml, which may require re-writes of openoffice-writer2forrest.xsl.

Clay Leeds added a comment - 21/Oct/04 03:04 AM
First attempt at updating openoffice-writer2forrest.xsl file to improve processing of Heading 2, Heading 3, Heading 4 & Heading 5. It would be great to get comments on this.

Place in the following location:

"forrest/src/core/context/resources/stylesheets/openoffice-writer2forrest.xsl"

Clay Leeds added a comment - 21/Oct/04 06:25 AM
modified openoffice-writer.sxw file which attempts to add Heading 3, Heading 4, Heading 5 to file.

Clay Leeds added a comment - 21/Oct/04 05:34 PM
diff between my modified 'openoffice-writer2forrest.xsl' file and the original file in the forrest distribution.

Clay Leeds added a comment - 21/Oct/04 05:47 PM
I recently added three files:

1. openoffice-writer.sxw - 9 kb
    An openoffice-writer.sxw file I was hoping to submit as a
    "patch" for forrest's seemingly flawed openoffice-writer.sxw file.
    My version includes a modified style.xml including Heading 3,
    Heading 4, and Heading 5, as well as a content.xml file which
    includes examples of these styles. The Forrest version is missing
    examples of Heading 3, Heading 4, & Heading 5, so the styles
    are not retained due to the bug.

2. openoffice-writer2forrest.xsl - 12 kb
    An improved openoffice-writer2forrest.xsl stylesheet which
    attempts to work around the bug by also formatting improperly
    structured OOo Headings. My version needs help, as it doesn't
    cycle through Headings (it only accounts for Heading 1-5). It
    would be better to use XSL to cycle through '"Heading "+n' or
    something.

3. openoffice-writer_clay.diff - 0.2 kb
    A diff of my openoffice-writer2forrest.xsl to the version in the
    forrest distribution.

Frédéric Glorieux added a comment - 22/Oct/04 08:19 AM
I'm working on thousands of OOo files, I never see that, and I'm unable to reproduce your bug. Don't you think it depends on OOo version ?

But something sure is, your transformation don't handle correctly the automated styles.

Open an empty doc, write some words, new paragrape, select this paragraph and put in bold.


<office:document>
 <office:automatic-styles>
  <style:style style:name="P1" style:family="paragraph" style:parent-style-name="Standard">
   <style:properties fo:font-weight="bold" style:font-weight-asian="bold" style:font-weight-complex="bold"/>
  </style:style>
 </office:automatic-styles>
 <office:body>
  <text:p text:style-name="P1">Un petit test</text:p>
  <text:p text:style-name="P1"/>
  <text:p text:style-name="Standard"/>
 </office:body>
</office:document>

I got some code to handle that, it works but is really not nice.

Last tip, if you need to check a lot the generated oo xml, add you an export filter (only export) with an identity xsl, and then File/export/choose your filter



Clay Leeds added a comment - 22/Oct/04 05:38 PM
original openoffice-writer.sxw file which attempts to add Heading 3, Heading 4, Heading 5 to file.

Clay Leeds added a comment - 22/Oct/04 05:49 PM
I've added the pristine 'forrest seed' version of openoffice-writer.sxw to this issue. This is the file I used to create the openoffice-writer.sxw file I modified (should've given my modified file a different name... sorry!).

As I mentioned, the only things I did, was extract the styles.xml & content.xml files from the original 'forrest seed' version of openoffice-writer.sxw, then copied and pasted 'Heading 2' portions to create 'Heading 3', 'Heading 4' and 'Heading 5' sections.

Then, in content.xml, I similarly copied the 'Heading 2' portions to create 'Heading 3', 'Heading 4' and 'Heading 5' sections.

Then I created a zip archive using the Mac OS X built-in 'Archive'. Perhaps the Mac OS X version of ZIP does something differently.

I hope this sheds some light on the subject.

Web Maestro Clay

Clay Leeds added a comment - 29/Oct/04 07:51 PM
New and improved openoffice-writer.sxw file which includes all known styles which (I think) should be included in the forrest openoffice-writer file:

- Title
   (not present? Forrest currently uses File > Properties > Description > Title, but this means that the document does not have a title in when in sxw form)
- Heading 1
- Heading 2
- Heading 3
- Heading 4 (not present currently)
- Heading 5 (not present currently)
- Unordered List
- Ordered List
- Table
- Boxes
   o Forrest: Source
   o Forrest: Warning
   o Forrest: Note
   o Forrest: FixMe
- Character-based Styles
   o bold (strong emphasis)
   o italic (emphasis)
   o Forrest: Above (<sup>above</sup>)
   o Forrest: Below (<sub>below</sub>
   o Forrest: Code (<code>source code</code>)

David Crossley added a comment - 24/Dec/05 11:18 AM

You can reproduce this effect as follows:
 
1 Create a new text file.
2 Create 20 lines of text, one of each style Heading 1-10, with a "default" format line between each.
3 Save this file (as "head.sxw" for example), unzip it and examine its content.xml. You will see that all of the headings are of the format:

 <text:h text:style-name="Heading 5" text:level="5">Heading 5</text:h>

4 Copy head.sxw this to headless.sxw.
5 Open headless.sxw, delete the Heading 5 line, then save and close the file.
6 Reopen headless.sxw, and add a new line where the original Heading 5 line had been. Change the style of this new line to Heading 5.
7 Save this file, unzip it and examine its content.xml. The new line of style heading 5 is:

  <text:p text:style-name="Heading 5">New heading 5</text:p>


So ...

So perhaps openoffice-writer2forrest.xsl should reconsider how it matches headings. Instead of the current:

  <xsl:template match="text:h[@text:level='1']">

maybe it should match text:h or text:p with an attribute text:style-name="Heading 1" etc.

(Hopefully I will work out how to attach the .sxw and .xml files so you can see these examples easily)

Cyriaque Dupoirieux added a comment - 12/Mar/07 08:38 AM
This plugin should be superseded by the odt plugin.
Are we sure that we want to maintain this version which manage a format abandonned by OOo ?

Clay Leeds added a comment - 12/Mar/07 03:41 PM
Until the ODT plugin is completed, I would recommend this issue continue to live. It is still useful to some, and as far as I know, actually works better than the ODT plugin when it comes to parsing OOO Impress (PowerPoint-esque presentation) files.