Apache OpenOffice (AOO) Bugzilla – Issue 82902
OOo Writer corrupts document, can't open it after saving
Last modified: 2013-08-07 14:42:49 UTC
OOo Writer 2.3.0, Windows XP, German, corrupted a document I was writing; the document is neither excessively large (138 kB, 90 pages), nor complex in any way (no embedded images etc.); there was no crash of either OOo or the operating system; Writer simply corrupted the file completely by itself. The setting of "Optionen/Laden-Speichern", "XML-Format auf Größe optimieren" is checked (default setting). Error message: "Formatfehler in Teildokument content.xml an Position 2,728380(Teile,Spalte) in der Datei entdeckt". OOo Writer gives this error message in a dialogue box, the only option ist "OK" which closes the document; ther is no option to force opening it or to repair the file. I renamed the document to file.zip and checked content.xml with Validome (http:/ /www.validome.org/xml/validate/); this gave me two errors: (a) Dateiname: content.xml Spalte: 1335 Fehler: Die Deklaration des Elementes 'office:document-content' kann nicht gefunden werden. Fehlerstelle: ...ma-instance" office:version="1.0"><office:scripts/><office:font-face-decl office:version="1.0"> (b) Dateiname: content.xml Spalte: 728384 Fehler: Ungültiges Byte 4 einer 4-byte UTF-8-Sequenz. Fehlerstelle: ...cell table:style-name="Tabelle16.A1" office:value-type="string"><text:p t According to the tutorial I was following, the first error (a) can be ignored since OOo is supposed to generate invalid XML; regarding the second (b), the last "1" in the following statement is marked as invalid: table:style-name="Tabelle16.A1" I have no idea what a valid valid would be or what "Ungültiges Byte 4 einer 4- byte UTF-8-Sequenz" wants to tell me.
Reassigned to ES.
Fortunately this doesn't happen much often - OTOH that's unfortunate for us that have to reproduce the problem. I failed to do so in several cases. Can you provide us at least with the damaged document? It would be even better if you could send us a document that once modified and saved exactly produced such an invalid file.
I sent two versions of the file in question to <mba> (the corrupted version [2007-10-23.odt], and the "last known good" that OOo could open [2007-10- 17.odt]). There were no unusual operations; I regenereated the ToC, added a few paragraphs, inserted a few colums in a table (list of sources), and let the table sort itself alphabetically. Please let me know if I can provide any help. Thanks & Regards, -asb
Today, OpenOffice Writer 2.3.0, German, Windows XP SP-2, corrupted the document *again*. This time, the error reads: "Formatfehler in Teildokument content.xml an Position 9032,324(Zeile,Spalte) in der Datei entdeckt." [OK] This rises the (at least for me) *very* urgent question, if it is still recommended to use OpenOffice Writer 2.3.0 in the current state of obviously high instability to write an important document. The last time the document - it's my master's thesis - was corrupted by OOo is just one week ago; last time, it took me one full day to manually reconstruct most of the changes I made. If I lose every week at least one day to repair damages made by the software, my work will be critically endangered. So *please* give me a hint: Will this bug be repaired/fixed *soon* or will I have to switch to another software to prevent losing all my work *again*?
This issue is not special for 2.3.0. It is present in older versions also. On the contrary, 2.3 already has some fixes for that. The bug is a "time bomb". It depends on how often you have saved your document. Andreas Martens and I gave you some advice on the mailing list. We also offered you to fix the document so that you can work with it until the bug gets fixed.
I think it's time to confirm the issue
Andreas, please take over
> The bug is a "time bomb". It depends on how often you have saved > your document. I might add that there must be other factors besides simply the number of writings of the document to disk. Let's say, version n-1 (with n = number of writings to disk) could be opened, and version n became corrupted; then I fell back to version n-1 and reconstructed my changes from version n manually, then I continue to work with version n-1+x (x = additional changes), which becomes (currently) version n+11 and still loads fine. Since I re-did all my edits, and the number of disk-writes of version n+11 clearly exceeds n-1, there IMHO is most probably is something else involved (autosaves shouldn't fall into account since I don't leave the document open when I'm not working on it, and it's set to the default setting of 15 minutes; I definitely do save manullay more often, which resets the autosave counter, right?). Also, I changed the setting for "Optionen/Laden-Speichern", "XML-Format auf Größe optimieren" to "unchecked" with the last reconstructed version of the document from one week ago; the last 46 versions (= not version control, but manually created "derivates" of the file with one new file per day, yyyy-mm- dd.odt plus additional "subversions", counting from yyyy-mm-dd_a.odt to yyyy-mm- dd_z.odt) of the file were written with this setting, which _did_ _not_ prevent the new corruption; however, content.xml is now indeed better readable and editable when opening with an text editor. Regarding the hint from the mailing list (OOo encoding "_"-characters cumulatively when writing the file), I don't quite understand how I could influence what OOo is enconding or writing; also, Validome did and does not point to constructs like the mentioned "_5f_5f", and reducing this string wouldn't let me open the second corrupted file (I didn't try it on the first corrupted file). However, in the current (the two times manually reconstructed) document, I did a replacement of "_5f_5f" with "_5f" in content.xml (sums up to a total of 16.376 replacements) and wrote this back to the odt/zip file, which at least did not seem to harm the file. Would it be advisable to do this from time to time? Thanks & regards, -asb
Created attachment 49417 [details] Sample document shows the problem.
I was able to reproduce the problem with a new created document. 1. New Writer document 2. Insert a table of contenta 3. Create new paragraph style with the name "Content_Heading" 4. Edit table of contents/Styles and use the new style for the title. 5. Save document and reload it, all is fine. 6. Delete the style "Content_Heading" in the stylist. 7. Save the document and reload it. 8. Have a look into the styles of the table of contents, you will recognize that there is a style "Content_5f_Heading" is matched to the title. 9. From now on every save&reload will double the amount of "_5f" in the table of content/style. When the number becomes greater than 65000 the file cannot be opened with OOo.
We have to things to do: 1. Stop the duplication of "_5f" for unknown styles then we will never corrupt a file again. 2. Be able to open a damaged file even a style name exceeds the lengths of 65000 characters. At least 1. should be doable for 2.4.
I want to give you some advices how to avoid the situation and how to proceed if you're running into trouble with this issue. How to avoid this problem? A self created style with an underscore in its name could cause this trouble. Such a style should not be deleted when it is in use in a table of contents or somewhere else. What to do if a document could not be opened again? First you need a version which can be opened. So use your last backup of your document. But if you are able to edit the xml-files of your document, you could rescue your newest version as well. Have a look into the content.xml. If there are style names which contain a lot of "_5f_5f_5f" or "_20_20_20", you should replace this by only one "_5f" ("_20"). You could have a look into the styles.xml as well. If you have now a working version of your document, please open it with OOo. Have a look into all table of contents, alphabetical index etc. Have a look at the styles you are using in this indexes. If there are style names with "_5f" or "_20" in their displayed name, this is suspicious! Check if there is really such a style visible in the stylist. If there is no such style in the stylist you have to edit the index and to choose an existing style name. If you have done this, you should be able to open and edit your document forever ;-) But after a few opening, editing, savings, you could have a look into the content.xml again. If there are new "_5f_5f_5f" again, you should try to find out, who's using this unknown style.
Fixed in CWS sw8u10bf01 XMLIndexTitleTemplateContext.cxx
Ready for QA.
Reopen to close
duplicate *** This issue has been marked as a duplicate of 82443 ***
closed