Apache OpenOffice (AOO) Bugzilla – Issue 5276
HTML preview mode cannot handle large amounts of accents and punctuation
Last modified: 2013-08-07 14:38:26 UTC
Trying to load and edit a large (1M) all-text HTML document, I noticed that HTML editor and word processor doesn't show all of the text but only about 8% of it. The HTML editor shows the complete source but saves only the displayed part when I try to export it. But, oddly, it takes a small <pre> text that's at the end of the document and displays it at the end of the part of the document it's displaying. I even tried to remove the <pre> section, same result: the text abruptly ends at 21 out of 318 rows (as measured in NoteTab)
You can find the original document at http://abu.cnam.fr/cgi-bin/donner?nddp1
Federico, thanks for posting. I tried downloading the web page at the URL you provided, but it was only about 27Kb. Do I have the right web page?
They changed something, now it's at http://abu.cnam.fr/cgi-bin/donner_html?nddp1 Federico Spano`
Just want to add some notes to this issue. It is not the size of the file that is causing the problem. My guess is that there is a hidden character that is causing OOo to only display part of the HTML even though it has loaded the full source. The HTML in the original link is broken. I've run it through W3 Tidy but the problem still exists. I'm trying to isolate the part of the file that is causing the issue.
Duplicated on Win NT 4.0 SP6a, OOo 643. User summary: HTML editor can't show and export large documents Summary: User originally reported that the document at the link was loaded by the HTML editor, but in the HTML preview mode only part of the HTML page was displayed. If you switch to the HTML source mode, the entire document was present. The problem is the number of accents and punctuation marks in the HTML file. Once I start stripping out some of the accents and punctuation marks, I start seeing more of the HTML page in the HTML preview mode. Will attach a stripped down test case.
Created attachment 3243 [details] Cut down version of the HTML file in the user provided link.
In the attached HTMl file, try removing punctuation marks or the HTML accents. You'll see more and more of the HTML doc in the HTML preview mode. Note, I ran the HTML file at the user link through Tidy before I started to trim it down.
to me
ES->MIB: in source mode, the text ends with ", il tomba en poussière.". In WYSIWYG mode, ends with "cardinal tout décontenancé et" which indeed represents only 8% of the text.
The issue is that the document contains a single paragraph, and that the paragarph size is limited to 65535 characters on OOo.
Fixed in CWS sw009
Last comment was wrong.
There are two options to solve this: 1. Increasing the maximum paragraph size 2. Adding a paragraph break when the maximimum paragarph size is reached. Since paragraphs that have more than 65535 are an exception and in fact not very useful and since the paragraph size is limited by the string size (that is 65535 as well), option 2. seems to be the apropriate solution.
.
According to the OpenOffice.org roadmap (http://tools.openoffice.org/releases) this issue was retargeted to OOo Later.
*** Issue 23897 has been marked as a duplicate of this issue. ***
mib wrote: "The issue is that the document contains a single paragraph, and that the paragarph size is limited to 65535 characters on OOo." thus this is a duplicate of issue 17171 please decide which one to keep and mark the other one as duplicate. (if keeping this one, don't forget to change the summary...)
duplicate *** This issue has been marked as a duplicate of 17171 ***
closed