Issue 82902 - OOo Writer corrupts document, can't open it after saving
Summary: OOo Writer corrupts document, can't open it after saving
Status: CLOSED DUPLICATE of issue 82443
Alias: None
Product: Writer
Classification: Application
Component: open-import (show other issues)
Version: OOo 2.3
Hardware: All Windows XP
: P3 Trivial (vote)
Target Milestone: ---
Assignee: eric.savary
QA Contact: issues@sw
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-10-23 23:13 UTC by asb
Modified: 2013-08-07 14:42 UTC (History)
2 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
Sample document shows the problem. (8.19 KB, application/vnd.sun.xml.writer)
2007-11-05 09:27 UTC, andreas.martens
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description asb 2007-10-23 23:13:56 UTC
OOo Writer 2.3.0, Windows XP, German, corrupted a document I was writing; the 
document is neither excessively large (138 kB, 90 pages), nor complex in any 
way (no embedded images etc.); there was no crash of either OOo or the 
operating system; Writer simply corrupted the file completely by itself. The 
setting of "Optionen/Laden-Speichern", "XML-Format auf Größe optimieren" is 
checked (default setting).

Error message: "Formatfehler in Teildokument content.xml an Position
2,728380(Teile,Spalte) in der Datei entdeckt".

OOo Writer gives this error message in a dialogue box, the only option ist "OK" 
which closes the document; ther is no option to force opening it or to repair 
the file.

I renamed the document to file.zip and checked content.xml with Validome (http:/
/www.validome.org/xml/validate/); this gave me two errors:

(a)

Dateiname: content.xml
Spalte: 1335
Fehler: Die Deklaration des Elementes 'office:document-content' kann
nicht gefunden werden.
Fehlerstelle: ...ma-instance"
office:version="1.0"><office:scripts/><office:font-face-decl

office:version="1.0">

(b)

Dateiname: content.xml
Spalte: 728384
Fehler: Ungültiges Byte 4 einer 4-byte UTF-8-Sequenz.
Fehlerstelle: ...cell table:style-name="Tabelle16.A1"
office:value-type="string"><text:p t


According to the tutorial I was following, the first error (a) can be ignored 
since OOo is supposed to generate invalid XML; regarding the second (b), the 
last "1" in the following statement is marked as invalid:

   table:style-name="Tabelle16.A1"

I have no idea what a valid valid would be or what "Ungültiges Byte 4 einer 4-
byte UTF-8-Sequenz" wants to tell me.
Comment 1 michael.ruess 2007-10-24 11:15:12 UTC
Reassigned to ES.
Comment 2 Mathias_Bauer 2007-10-24 12:38:51 UTC
Fortunately this doesn't happen much often - OTOH that's unfortunate for us that
have to reproduce the problem. I failed to do so in several cases. Can you
provide us at least with the damaged document?

It would be even better if you could send us a document that once modified and
saved exactly produced such an invalid file.
Comment 3 asb 2007-10-24 13:57:28 UTC
I sent two versions of the file in question to <mba> (the corrupted version 
[2007-10-23.odt], and the "last known good" that OOo could open [2007-10-
17.odt]).

There were no unusual operations; I regenereated the ToC, added a few 
paragraphs, inserted a few colums in a table (list of sources), and let the 
table sort itself alphabetically.

Please let me know if I can provide any help.

Thanks & Regards, -asb
Comment 4 asb 2007-11-01 17:50:55 UTC
Today, OpenOffice Writer 2.3.0, German, Windows XP SP-2, corrupted the document 
*again*.

This time, the error reads:

"Formatfehler in Teildokument content.xml an Position 9032,324(Zeile,Spalte) in 
der Datei entdeckt." [OK]

This rises the (at least for me) *very* urgent question, if it is still 
recommended to use OpenOffice Writer 2.3.0 in the current state of obviously 
high instability to write an important document. The last time the document - 
it's my master's thesis - was corrupted by OOo is just one week ago; last time, 
it took me one full day to manually reconstruct most of the changes I made. If 
I lose every week at least one day to repair damages made by the software, my 
work will be critically endangered.

So *please* give me a hint: Will this bug be repaired/fixed *soon* or will I 
have to switch to another software to prevent losing all my work *again*?
Comment 5 Mathias_Bauer 2007-11-01 18:25:30 UTC
This issue is not special for 2.3.0. It is present in older versions also. On
the contrary, 2.3 already has some fixes for that.

The bug is a "time bomb". It depends on how often you have saved your document.
Andreas Martens and I gave you some advice on the mailing list. We also offered
you to fix the document so that you can work with it until the bug gets fixed.
Comment 6 Mathias_Bauer 2007-11-01 18:54:46 UTC
I think it's time to confirm the issue
Comment 7 Mathias_Bauer 2007-11-01 18:55:26 UTC
Andreas, please take over
Comment 8 asb 2007-11-02 00:37:49 UTC
> The bug is a "time bomb". It depends on how often you have saved
> your document.

I might add that there must be other factors besides simply the number of 
writings of the document to disk. Let's say, version n-1 (with n = number of 
writings to disk) could be opened, and version n became corrupted; then I fell 
back to version n-1 and reconstructed my changes from version n manually, then 
I continue to work with version n-1+x (x = additional changes), which becomes 
(currently) version n+11 and still loads fine. Since I re-did all my edits, and 
the number of disk-writes of version n+11 clearly exceeds n-1, there IMHO is 
most probably is something else involved (autosaves shouldn't fall into account 
since I don't leave the document open when I'm not working on it, and it's set 
to the default setting of 15 minutes; I definitely do save manullay more often, 
which resets the autosave counter, right?).

Also, I changed the setting for "Optionen/Laden-Speichern", "XML-Format auf 
Größe optimieren" to "unchecked" with the last reconstructed version of the 
document from one week ago; the last 46 versions (= not version control, but 
manually created "derivates" of the file with one new file per day, yyyy-mm-
dd.odt plus additional "subversions", counting from yyyy-mm-dd_a.odt to yyyy-mm-
dd_z.odt) of the file were written with this setting, which _did_ _not_ prevent 
the new corruption; however, content.xml is now indeed better readable and 
editable when opening with an text editor.

Regarding the hint from the mailing list (OOo encoding "_"-characters 
cumulatively when writing the file), I don't quite understand how I could 
influence what OOo is enconding or writing; also, Validome did and does not 
point to constructs like the mentioned "_5f_5f", and reducing this string 
wouldn't let me open the second corrupted file (I didn't try it on the first 
corrupted file).

However, in the current (the two times manually reconstructed) document, I did 
a replacement of "_5f_5f" with "_5f" in content.xml (sums up to a total of 
16.376 replacements) and wrote this back to the odt/zip file, which at least 
did not seem to harm the file. Would it be advisable to do this from time to 
time?

Thanks & regards, -asb
Comment 9 andreas.martens 2007-11-05 09:27:34 UTC
Created attachment 49417 [details]
Sample document shows the problem.
Comment 10 andreas.martens 2007-11-05 09:36:19 UTC
I was able to reproduce the problem with a new created document.
1. New Writer document
2. Insert a table of contenta
3. Create new paragraph style with the name "Content_Heading"
4. Edit table of contents/Styles and use the new style for the title.
5. Save document and reload it, all is fine.
6. Delete the style "Content_Heading" in the stylist.
7. Save the document and reload it.
8. Have a look into the styles of the table of contents, you will recognize that
there is a style "Content_5f_Heading" is matched to the title.
9. From now on every save&reload will double the amount of "_5f" in the table of
content/style. When the number becomes greater than 65000 the file cannot be
opened with OOo.
Comment 11 andreas.martens 2007-11-05 09:39:59 UTC
We have to things to do:
1. Stop the duplication of "_5f" for unknown styles then we will never corrupt a
file again.
2. Be able to open a damaged file even a style name exceeds the lengths of 65000
characters.
At least 1. should be doable for 2.4.
Comment 12 andreas.martens 2007-11-06 08:21:06 UTC
I want to give you some advices how to avoid the situation and how to proceed if
you're running into trouble with this issue.

How to avoid this problem?
A self created style with an underscore in its name could cause this trouble.
Such a style should not be deleted when it is in use in a table of contents or
somewhere else.

What to do if a document could not be opened again?
First you need a version which can be opened. So use your last backup of your
document. But if you are able to edit the xml-files of your document, you could
rescue your newest version as well. Have a look into the content.xml. If there
are style names which contain a lot of "_5f_5f_5f" or "_20_20_20", you should
replace this by only one "_5f" ("_20"). You could have a look into the
styles.xml as well.
If you have now a working version of your document, please open it with OOo.
Have a look into all table of contents, alphabetical index etc. Have a look at
the styles you are using in this indexes. If there are style names with "_5f" or
"_20" in their displayed name, this is suspicious! Check if there is really such
a style visible in the stylist. If there is no such style in the stylist you
have to edit the index and to choose an existing style name.
If you have done this, you should be able to open and edit your document forever ;-)
But after a few opening, editing, savings, you could have a look into the
content.xml again. If there are new "_5f_5f_5f" again, you should try to find
out, who's using this unknown style.
Comment 13 andreas.martens 2007-11-07 15:02:29 UTC
Fixed in CWS sw8u10bf01
XMLIndexTitleTemplateContext.cxx
Comment 14 andreas.martens 2007-11-21 09:50:42 UTC
Ready for QA.
Comment 15 eric.savary 2007-12-07 15:26:04 UTC
Reopen to close
Comment 16 eric.savary 2007-12-07 15:27:01 UTC
duplicate

*** This issue has been marked as a duplicate of 82443 ***
Comment 17 eric.savary 2007-12-07 15:47:20 UTC
closed