Issue 17171 - Paragraph cannot be longer than 65534 characters
Paragraph cannot be longer than 65534 characters
Status: ACCEPTED
Product: Writer
Classification: Application
Component: ui
OOo 1.0.3
All All
: P3 trivial with 95 votes (vote)
: ---
Assigned To: Oliver Specht
: ms_interoperability, oooqa
: 5276 17329 18417 19969 38603 39770 41049 42283 50639 50764 53093 53473 54720 54890 55464 68340 69580 72234 74323 76313 82809 84902 85007 96176 107382 111982 112997 (view as issue list)
Depends on:
Blocks: 59185 64913
  Show dependency treegraph
 
Reported: 2003-07-21 12:14 UTC by diegomann
Modified: 2013-10-05 00:24 UTC (History)
25 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation on: ---
Developer Difficulty: ---


Attachments
file converted incorrectly and pages missing (62.28 KB, application/octet-stream)
2003-07-21 12:16 UTC, diegomann
no flags Details
simpler doc-file to reproduce the problem. (4.45 KB, application/zip)
2003-07-27 13:36 UTC, lohmaier
no flags Details
An example document where it is not possible to write anything more (6.71 KB, application/vnd.oasis.opendocument.text)
2005-04-06 09:57 UTC, ataraxia
no flags Details
very long paragraph sample (8.44 KB, application/vnd.oasis.opendocument.text)
2007-11-06 19:17 UTC, ohallot
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description diegomann 2003-07-21 12:14:18 UTC
Hi!

This file is not fully converted, there are a lot of pages missed.

Hope you will fix it.

Thanks in advance

Best Regards
DIEGO URRA
Comment 1 diegomann 2003-07-21 12:16:20 UTC
Created attachment 7888 [details]
file converted incorrectly and pages missing
Comment 2 pmartel60 2003-07-23 22:07:00 UTC
The file to be converted contains a long paragraph that extends for 
over 100 pages. The import filter appears to be truncating the 
paragraph precisely at the 65535th (i.e. max unsigned short) 
character.
Inserting a paragraph break a little before the truncation point
prevents the problem from occuring for another 64K characters.
A workaround is to use smaller paragraphs -- a new paragraph about 
every 12 pages should nicely prevent this problem from occuring under 
normal circumstances.
In 1.1 RC, the data is no longer truncated, but a new paragraph is 
started for you after every 65534 characters, even if it splits a 
word. I will report this as another issue.
Comment 3 pmartel60 2003-07-23 22:36:29 UTC
OOo 1.1 RC handling of this case has been reported as Issue 17329.
Comment 4 lohmaier 2003-07-27 13:29:09 UTC
please update the summary to reflect the actual  issue before
confirming an issue. Changed Summary, added ms_interoperability,
setting target-milestone
original summary:
"Bad conversion, pages missing and wrong behaviour with an easy file."
Comment 5 lohmaier 2003-07-27 13:33:58 UTC
*** Issue 17329 has been marked as a duplicate of this issue. ***
Comment 6 lohmaier 2003-07-27 13:34:53 UTC
from issue 17329:
Importing an MS Word Doc with paragraph longer than 65535 characters
causes a 
paragraph break to be inserted after every 65534 characters.
Discovered while 
trying unsuccessfully to reproduce the OOo 1.0.3 Issue 17171 on OOo
1.1 RC.
Comment 7 lohmaier 2003-07-27 13:36:55 UTC
Created attachment 8051 [details]
simpler doc-file to reproduce the problem.
Comment 8 h.ilter 2003-07-28 15:14:17 UTC
Reassigned to MIB
Comment 9 michael.brauer 2003-08-05 14:19:01 UTC
.
Comment 10 Oliver Specht 2003-08-21 07:15:23 UTC
*** Issue 18417 has been marked as a duplicate of this issue. ***
Comment 11 andreas.martens 2003-09-12 14:58:39 UTC
.
Comment 12 michael.brauer 2004-01-12 08:53:36 UTC
.
Comment 13 Oliver Specht 2004-01-12 10:11:37 UTC
.
Comment 14 lohmaier 2004-02-25 21:46:54 UTC
*** Issue 19969 has been marked as a duplicate of this issue. ***
Comment 15 lohmaier 2004-12-07 22:59:14 UTC
*** Issue 38603 has been marked as a duplicate of this issue. ***
Comment 16 michael.ruess 2005-01-03 07:21:29 UTC
*** Issue 39770 has been marked as a duplicate of this issue. ***
Comment 17 lohmaier 2005-01-20 19:19:41 UTC
*** Issue 41049 has been marked as a duplicate of this issue. ***
Comment 18 michael.ruess 2005-02-09 11:06:20 UTC
*** Issue 42283 has been marked as a duplicate of this issue. ***
Comment 19 ataraxia 2005-04-06 09:57:05 UTC
Created attachment 24725 [details]
An example document where it is not possible to write anything more
Comment 20 lohmaier 2005-06-12 20:47:42 UTC
*** Issue 50639 has been marked as a duplicate of this issue. ***
Comment 21 lohmaier 2005-06-15 14:11:44 UTC
*** Issue 50764 has been marked as a duplicate of this issue. ***
Comment 22 michael.ruess 2005-08-09 15:29:35 UTC
*** Issue 53093 has been marked as a duplicate of this issue. ***
Comment 23 eric.savary 2005-08-18 10:45:01 UTC
*** Issue 53473 has been marked as a duplicate of this issue. ***
Comment 24 eric.savary 2005-08-18 10:45:54 UTC
*** Issue 5276 has been marked as a duplicate of this issue. ***
Comment 25 anteru 2005-08-18 12:13:00 UTC
anyone working on this? seems this bug is known since 2003, is it likely to be
fixed anytime soon?
Comment 26 Oliver Specht 2005-08-19 07:55:57 UTC
No, it's no likely to be fixed soon. 
As you can see it's targeted to OOo Later and a Prio 4. Which is absolutely
appropriate. 
Besides that it's not easyly fixable. 
Comment 27 lohmaier 2005-09-17 21:22:40 UTC
*** Issue 54720 has been marked as a duplicate of this issue. ***
Comment 28 Regina Henschel 2005-10-04 21:09:44 UTC
*** Issue 55464 has been marked as a duplicate of this issue. ***
Comment 29 allsorts46 2005-10-04 21:19:06 UTC
This is still a serious issue, as it causes loss of data which cannot be
recovered (by 'undo' or other means, except entirely reloading the document,
assuming you are fortunate enough to have noticed before you saved, overwriting
the original).

Please read <a
href='http://www.openoffice.org/issues/show_bug.cgi?id=55464'>Issue 55464</a>.

At the very least, Writer should not allow any actions that would cause this to
happen, and present an error and explanation to the user. Simply 'losing' the
text is unnaceptable.
Comment 30 yunkong 2005-10-25 12:05:13 UTC
*** Issue 54890 has been marked as a duplicate of this issue. ***
Comment 31 renatoyamane 2006-07-24 02:11:11 UTC
This bug have 3 YEARS!?!
Some friends have problems with this bug, because some documents need to be
written in only 1 paragraph and this is impossible with "OOo Writer".
The problem is still greater when documents writer in "MS Office Word" are
converted to "OOo Writer", because when exceeding 65534 characters, the text is
eliminated!
Please, don't forget this bug!
Comment 32 lohmaier 2006-08-02 22:55:36 UTC
1st: If you think changing this is so easy, then provide a patch or pay someone
to provide a patch
2nd: Tell me what percentage of users need such long paragraphs.
3rd: when importing, the text is not eliminated, but split into several paragraphs.
Comment 33 allsorts46 2006-08-04 10:28:39 UTC
1st: This kind of attitude is extremely unhelpful for anything. Obviously, the
majority of OO's end users are not capable of writing patches - that's what the
developers do. Nobody suggested it was easy, but writing an office suite isn't
easy either. Looking at some of the features which *are* being given priority,
it makes more sense to me that something as fundamental as being able to handle
text properly should be among them. Paying someone to provide a patch would
defeat one of the major advantages of OO - it's free. If I have to pay money for
a word processor not to lose my work, I'd rather spend it on purchasing a
commercial package of higher quality. Also, the longer this problem is ignored,
the more difficult it will become to change later. 

2nd: I think this is rather irrelevant. What percentage of users need Obscure
Feature X? A word processor should handle text, as much text as a user wants and
in the way the user wants to format it it - everything else is second place.

3rd: However, any attempt to rejoin the split paragraphs results in immediate
loss of data, without warning, and without any undo information being saved.
Comment 34 lohmaier 2006-08-10 13:41:35 UTC
*** Issue 68340 has been marked as a duplicate of this issue. ***
Comment 35 renatoyamane 2006-08-12 00:50:57 UTC
Cloph,
you have an e-mail @openoffice.org, so I think that you work on OpenOffice.Org.
Unhappyly, OpenOffice.Org have so ignorant "employees" as you!

One more data:
ALL law-offices need write atas (I don´t know translate this word to english, 
but "ata" is a word from latim "ACTA" and it means a "write record about what 
was made in meeting")

Understand me?

This document ("acta") need writen in only one paragraph (it´s a rule!), and in 
most cases this paragraph need more than 65k characters, so is impossible use 
OpenOffice.Org.

So, I think this information it answers your question 02.

Best regards,
Renato Yamane

Comment 36 eric.savary 2006-09-18 11:25:23 UTC
*** Issue 69580 has been marked as a duplicate of this issue. ***
Comment 37 aziem 2006-10-01 22:57:16 UTC
On OOo 2.0.4rc2 Linux, the statistics are wrong for the LongPara document.  The
statistics show 0 word and 0 characters.  Also, it is easy to hang
OpenOffice.org by using a combination of copy and paste, normal typing, and
backspace to create a really long paragraph (starting from a blank document).  

If this should be a separate issue, let me know.
Comment 38 michael.ruess 2006-12-04 09:47:06 UTC
*** Issue 72234 has been marked as a duplicate of this issue. ***
Comment 39 kpalagin 2007-02-07 23:38:45 UTC
*** Issue 74323 has been marked as a duplicate of this issue. ***
Comment 40 kpalagin 2007-03-25 17:41:44 UTC
Dear developers,
judging by number of duplicates the problem seems to be affecting a number of 
our customers. This issue is especially important in academic environment, 
where authors must follow the rules and simply can't insert paragraph breaks 
at whim of their wordprocessor.

The problem is aggravated by the fact that Writer simply looses data (without 
Undo) if user attempts to remove extra paragraph break.
(Needless to say that our competitors do not have this problem).

Upping the priority because our current behavior can cause data loss.
Please consider targeting for 2.3.
Thanks a lot for your attention.
Comment 41 eric.savary 2007-04-12 16:37:58 UTC
*** Issue 76313 has been marked as a duplicate of this issue. ***
Comment 42 jwr 2007-04-20 09:40:04 UTC
US- and EU-gouvernments decided to use "Open Format" for all official documents
including deeds (law-documents - "acta") and patents ? -> OO will have to cope
with these documents or US- and EU-gouvernments will prescribe some other tools...

Regards -Hans-
Comment 43 mike_hall 2007-06-20 11:34:51 UTC
Another scenario where this can happen is doing certain global edits to improve
formatting of a long document, which often require the temporary replacement of
end of paragraph marks to create a single paragraph document, which is highly
likely to exceed the limit. At present, I have to go back to Word in this situation.

Plenty of concern about this issue, which as it involves data loss should
arguably be P2 rather than P3. Target of OOo Later effectively is 'never'.
Please consider changing target at worst to 3.0, preferably to 2.4.
Comment 44 Regina Henschel 2007-10-20 18:37:43 UTC
*** Issue 82809 has been marked as a duplicate of this issue. ***
Comment 45 renatoyamane 2007-10-21 01:48:03 UTC
I have a doubt.
Here we have any OOo developer reading this issue?
This bug is very older (4 years)!
I follow other older bug
(http://www.openoffice.org/issues/show_bug.cgi?id=24969), with more than 3
YEARS, and when a OOo Developer found it, he fix the problem in only 3 DAYS!

Regards,
Renato
Comment 46 renatoyamane 2007-11-01 22:56:48 UTC
Can someone change Target Milestone?
This bug is very important because can LOST data and I think that "OOo Later"
target is very obscure (more 1 month? 1 year? 100 years?)

Resume: All law-offices need write actas, and this documents is writen in only 1
paragraph (it is a rule), so when write more than 65534 characters, we LOST data!

Regards,
Renato
Comment 47 Mathias_Bauer 2007-11-05 14:17:24 UTC
I understand that this issue is a killer for a few users. But as the defect is
in some central code (the old tools String class) fixing it would require to
change 90% of OOo's code base (wild guess). Even if we agreed on fixing the
problem for Writer (and I'm still not doing this) all other developers are
concerned as well. 

It would took several months in nearly all projects to do the change and fix all
warnings and bugs that slip in while doing the change. In the meantime nothing
else can be done on the code as the risk to run into merge problems is huge.
This would be an effective standstill of development for several months. 

I still don't see that this is judged by the undeniable benefit for a small
percentage of the user base. And especially we can't do it now, in the middle of
a lot of work that has already started. So the target is still valid as this
means that 3.0 is impossible.


Comment 48 jwr 2007-11-05 14:30:06 UTC
I fully understand this problem cannot be solved soon.
However I could imagine some intermediate options like:
- a solution to prevent crashes
- a solution to prevent data losses 
- a solution to warn users 
Maybe these intermediate options are helpful or sufficient for 90% of the users.
Regards -Hans-
Comment 49 Mathias_Bauer 2007-11-05 15:34:50 UTC
Sure, I'm happy with doing that. Unfortunately this issue basically is not about
fixing the potential data loss and so we have to work on that outside of it.

If someone volunteered to dig out the the crashes and data losses and moved them
to one or more separate issues we could try to fix them. 
Comment 50 mike_hall 2007-11-06 16:41:31 UTC
Was waiting to see if someone more experienced would volunteer... 

I'm happy to have a go at creating issues for the interim options over the next
two weeks. Comments on them would be appreciated in due course.

Comment 51 Mathias_Bauer 2007-11-06 17:24:51 UTC
Excellent! So in case we have reproducable scenarios where either OOo crashes or
document content gets lost unnoticed we should be able prevent the disaster and
I will make sure that this will be fixed as soon as possible.

I'm sorry to disappoint users wanting to work with huge paragraphs, but I can't
help. 
Comment 52 ohallot 2007-11-06 19:15:38 UTC
That is very easy to prove that contents get lost unoticed. On the attached
document (LongParagraphIllness.odt), go to page 15 (or last page) and remove the
paragraph ending, joining it with the following.

The second paragraph (or part of it) is lost with no further notice. 
Comment 53 ohallot 2007-11-06 19:17:11 UTC
Created attachment 49478 [details]
very long paragraph sample
Comment 54 superm401 2007-11-07 20:41:40 UTC
I've filed Issue 83427 (http://www.openoffice.org/issues/show_bug.cgi?id=83427),
which addresses the data loss issue.
Comment 55 superm401 2007-11-08 10:44:40 UTC
I just want to suggest that when this is fixed, the limit be changed from 2^16
to 2^64 (not 2^32).  Planning ahead is always good...
Comment 56 Mathias_Bauer 2007-11-08 11:04:00 UTC
Why only 64Bits and not 128? ;-)
OK, I hope you see what I mean: larger is not necessarily always better, you
must stop somewhere. And where to stop can be judged only by reasoning.

As our API to work on paragraphs make is necessary that the whole paragraph text
fits into one String variable it follows that the maximum length of a String
must match the maximum length of a paragraph. Even if I consider changing the
length of our UNO API String from 32Bit to 64Bit I don't see a clear benefit of
doing so (and the effort and pain doing this change would be huge!).

I have my doubts that using 64Bit integers for string length and indices is a
good idea as it will influence performance. Handling 64Bit variables on 32Bit
computers will result in a considerable slowdown. And what for? 32Bit will give
us paragraphs with 4,2 billion characters. Saving this uncompressed will result
in a file size of more than 8GB (2 Bytes per character) - only for one paragraph! 
I don't want to sacrifice performance for the ability to have paragraphs larger
than 4 billion characters what very probably never would be needed.
Comment 57 jwr 2007-11-08 11:25:12 UTC
An upper limit for paragraph lengths must always be considered as a second best
choice. Issue 83427 reports Abiword does not have this 64k-limit-problem
in any form when working with ODT files. Is it possible for OOo-tools to adopt
the same technology?
Comment 58 superm401 2007-11-08 11:28:58 UTC
I don't think 2^64 is necessary.  But I think it will be reasonable by the time
this bug is fixed, since already almost all new processors are 64-bit.
Comment 59 Mathias_Bauer 2007-11-08 11:35:44 UTC
jwr: the 16Bit limit IMHO is just a bug; in OOo's API Strings have 32Bit length,
just Writer's internal implementation (that in its beginning dates back to 1992
or so) has this limitation. Removing it is desirable but quite some work to do.

This is different for the 64Bit case. Even I we assumed that in a few years most
computers will use 64Bit machines (I doubt that - at least for everything
ourside of the "western" world) the other drawbacks like loss of UNO API
compatibility still remain.

Comment 60 simos.bugzilla 2007-11-08 12:50:56 UTC
Just to get an idea what 32-bit or 64-bit size paragraphs can give you:

"32-bit" size strings/paragraphs amount to about 16 million characters.
The Encyclopaedia Britannica has 32 volumes and 44 million words, amounting to
about 1.4 million words per volume. If we assume that each word is about 10
letters (actually it is less), then a volume of Britannica fits well into a
32-bit size string (about 14 million letters) and we have over 2 million
characters to spare.

Therefore, a 32-bit size string can fit in a volume of Encyclopaedia Britannica.
Considering that the OOo API strings are 32-bit, it would make sense to go for
32-bit. 

Comment 61 floris_v 2007-11-08 13:09:42 UTC
That sounds reasonable, but what if a clever person opens all volumes of the EB
into one document and deletes all paragraph markers? With computers getting more
and more powerful, that's just the kind of thing you can expect to happen sooner
or later.
Comment 62 superm401 2007-11-08 13:30:47 UTC
floris_v, that's exactly my point.  With 32-bit, it's possible to give examples
of content that wouldn't fit.  With 64-bit, it becomes absurd.  Not only can
64-bit fit the entire EB, it can fit over a billion Wikipedias (all languages
combined).
Comment 63 simos.bugzilla 2007-11-08 16:56:24 UTC
Guys, we have to put this in perspective.
My point was that 32-bit is just enough for a new limit for the size of a single
paragraph. I feel that 32-bit size for a single paragraph is fine.

If we try to push for 64-bit, my very good guess is that this issue will get
stalled and not resolved in the near future.

Since the OOo API for strings uses 32-bit numbers for the size, it appears
easier to get 32-bit paragraph sizes.

In typesetting terms, it looks like a fringe case to demand from a desktop text
editor to handle a single paragraph that consists of more than a volume of the
Encyclopaedia Britannica. 

It looks like a better strategy to focus on getting 32-bit paragraph sizes,
because the next option (still 16-bit size but just complain when going over the
limit) may not be ideal for the examples shown above.

If the string size is 32-bit and you use strings to represent paragraphs, then
in order to go to 64-bit paragraph size you need to make too many changes in the
program logic, check for regressions, etc, which does not appear to happen.

It would make sense to go for 64-bit sizes when the basic data types of OOo are
of 64-bit. This might happen in the next years, which would be a more
appropriate time to ask for 64-bit paragraphs as well.
Comment 64 floris_v 2007-11-08 17:10:59 UTC
Frankly, this whole issue confuses me. Why do you want a fixed maximum size for
a paragraph in the first place? Why do you want to store a single paragraph in a
string? IMHO that whole concept is flawed. I have in the past removed all
paragraph marks in Word documents, so that I got a document with one paragraph,
of over 64 Kb, and no problem at all. I'm not sure anymore why I needed to do
that, only that I did, and that it was very useful. Apparently the programmers
of Word didn't need a max length for paragraphs, and if they didn't, then why do
you?
Comment 65 Mathias_Bauer 2007-11-08 19:14:43 UTC
Folks, can we move that discussion elsewhere? The poor developers that later on
will work on that issue will have problems to find relevant information in this
"chat".

It's clear that 64KB for a paragraph is not enough and the OOo UNO API already
uses 32Bit whereever text is retrieved from a text object (e.g. paragraphs,
portions, selections etc.). This is not related to how a paragraph stores its
text internally. This way how the text is stored in the implementation is not
the problem of this issue. 

We have an *internal C++ API* that is limited to 16Bit and *this* must be fixed.
And the problem is that this API is so widespread in OOo. 

The *external API* (UNO API) uses 32Bit and this will not be changed without a
valid reason. The ability to store the whole Encyclopedia Britannica, the
Wikepedia and the whole Google cache into a single paragraph is *not* a valid
reason.

So please try to add only comments that describe new, currently unknown
situations where the 64KB limit bites users (situations that are not described
in this issue or its duplicates). Thank you.
Comment 66 kpalagin 2007-11-09 06:52:44 UTC
Here is another data-loss issue - 59185.
Comment 67 Mathias_Bauer 2007-11-09 08:53:05 UTC
Thanks for the hint; as issue 59185 looks a bit more complicated to fix I keep
the "3.x" target for now but we will reinvestigate the effort to see if we can
switch the target to 3.0.
Comment 68 eric.savary 2007-12-30 17:47:53 UTC
*** Issue 84902 has been marked as a duplicate of this issue. ***
Comment 69 aziem 2008-01-06 06:11:41 UTC
*** Issue 85007 has been marked as a duplicate of this issue. ***
Comment 70 eric.savary 2008-11-19 00:04:54 UTC
*** Issue 96176 has been marked as a duplicate of this issue. ***
Comment 71 renatoyamane 2008-11-19 16:53:33 UTC
I think that correct "Priority level" need be P2 and not P3.
P2 is an issue with "data loss".

And, I think that P2 priority need a target ASAP, not "OOo Later".

"OOo later" mean "When someone take a look here" or "Never".

Best regards,
Renato
Comment 72 Mathias_Bauer 2008-11-20 09:51:41 UTC
That depends. If the requirement is about extending the size of paragraphs, it's
at best a P3.

If the requirement is to avoid data loss in case a paragraph is changed to get
over this boundary - well, yes, that would be a P2.

If everybody is fine with separating this, we could create a second issue and
make that a P2. 
Comment 73 superm401 2008-11-20 10:09:09 UTC
I filed a separate bug for the data loss issue
(http://www.openoffice.org/issues/show_bug.cgi?id=83427) and linked it here over
a year ago.  Not sure what more you're asking, mba.
Comment 74 Mathias_Bauer 2008-11-20 12:31:04 UTC
I just mentioned that if there was something that deserves a "P2" it would be
the data loss part of the problem, not the inability to work with larger paragraphs.

I just had forgotten that we already have been there and that the data loss has
been fixed meanwhile. So as long as nobody proves the opposite: there is no data
loss anymore and so P3 is enough.

In case someone finds a situation where data loss caused by our paragraph limit
still occurs: please do it like "superm401" and create a separate issue that can
be fixed with an earlier target. Fixing the data loss should always be possible
without extending the paragraph limit.
Comment 75 renatoyamane 2008-11-20 14:44:40 UTC
I don't agree with separate this bug in TWO.

IMHO:

1) Change priority of this bug to P2;
2) Fix problem about data loss if developers can't fix problem about "longer
paragraph";
3) Change priority of this bug to P3;
4) Fix problem filed in 2003 (longer paragraph).

This is not 2 bugs. This is only one bug:
- If you write a longer paragraph, data can be lost.

Why the hell we need open a new issue to:
- Data can be lost if you write a longer paragraph?

They are the *SAME*!!

What is the root cause? Longer paragraph!

If developers can't fix problem about "longer paragraph", so do an work-arround
to avoid data lost.

Best regards,
Renato
Comment 76 Mathias_Bauer 2008-11-20 15:38:26 UTC
Our QA won't accept a partial fix for this issue. 

So again: if somebody knows a case where the paragraph problem will create data
loss, please create a separate issue so that we can fix it. Or stay with only
one issues and wait for probably a longer time until we don't have more
important problems than the inability to work with paragraphs that are longer
than most of the documents our users create on average.
Comment 77 urmasd 2009-09-10 04:27:21 UTC
Still present in OOO310m11 (9399). Come on, it's 6 yr. old bug. Even Word 2 had
no such restriction, and it was 16-bit application from 1992!
Comment 78 diegomann 2009-09-25 16:16:34 UTC
Hi!

I started this bug back on Monday, Jul 21 2003. 

I still have hope someone will take REAL care of this.

This is a real-world bug and it has a serious impact on users and in general
word processing. I really want to stress this. It's not a theorical thing.

Best Regards
DIEGO URRA

"All good things come to those who wait.", Ronin
Comment 79 renatoyamane 2009-09-26 18:39:14 UTC
Diego, IMHO I think that only way to fix this bug is buying a Microsoft Office
Suite.

This is the same in others areas that is necessary use Proprietary Softwares,
like Autodesk Inventor, Solidworks, etc.

Laws Office still need a Microsoft Office softwares[1]. OOo can´t be used on it.
This is a critical bug to Laws Offices, and a 6 years old bug means "Live with
this bug".

[1] See comments:
* Fri Aug 11 23:50:57 +0000 2006
* Fri Apr 20 08:40:04 +0000 2007
Comment 80 michael.ruess 2009-12-03 12:34:17 UTC
*** Issue 107382 has been marked as a duplicate of this issue. ***
Comment 81 anoopshah 2010-01-02 16:35:11 UTC
I presume lawyers want to create documents that look like a single paragraph 
when printed and are not bothered about OOo's internal representation of the 
paragraph. For performance reasons it is better to keep paragraphs small, 
certainly much smaller than 60,000 characters.

What about the following solution:
Provide an additional paragraph style feature "merge with next paragraph". If a 
paragraph has this style, OOo is allowed to move the paragraph mark when 
repaginating / refreshing the document so that it always seems to merge into the 
next paragraph. When documents with large paragraphs are opened, they are 
automatically split into smaller paragraphs with this paragraph style feature. 
When saving in Microsoft Word format, the paragraphs can be merged back into a 
huge paragraph.

Alternatively, without a new format:
1. On detecting a very long paragraph approaching the limit, Writer will 
automatically insert a paragraph break and warn the user that it has done so.

2. To provide an extension or plugin for 'virtual paragraphs'. This program 
scans through the text in the current section (or selection) and moves all 
paragraph marks to the end of the nearest line, to give the illusion (when 
printed) that the document consists of one paragraph. It will check that there 
is zero extra linefeed before or after the moved paragraph marks. The program 
will not alter double paragraph marks, or marks between paragraphs with 
different text styles.
Comment 82 mike_hall 2010-01-03 23:15:49 UTC
@anoopshah
Those are very imaginative approaches, but do they resolve the problem?

The difficulty is that the suggestions, particularly the first, add unnecessary
complexity and it's not obvious that the extra coding required would be less
than doing the job properly. Writer is perfectly capable of handling a document
with more than 65534 characters. Why can't a paragraph be paged in and out in an
analogous way to a document? That might improve rather than detract from
performance.

The bottom line is that there are users who need very large paragraphs, whether
because of the nature of the document or as an interim step in editing large
documents. A 100 page book isn't very long in this context.

Hard limits were common in early application software. Later, it was generally
accepted that limits should be imposed only by what the hardware can handle. I
understand that this may be a very difficult issue to fix within Writer, but it
seems inevitable that the limit must be removed one day. A professional product
would have this capability.

Comment 83 piduca 2010-05-20 14:24:58 UTC
Hello everyone.
(at first, sorry for my very bad English)

Every week I need to write at least one "ata" - as renatoyamane said, this is
the portuguese word from the latin "acta", that is the name of a document where
people records the events of an official governmental meeting.
Some of the developers and some of the people that follow this thread can think:
"Why someone would need a paragraph with more than 65535 characters?"... My
answer is: this is a necessity!!! And, unlike most of you can imagine, there are
many people that have this needs!!!
"Actas" (I didn't find the English word either for it, renatoyamane) need to be
written in ONE PARAGRAPH, and as someone already said before, this is a RULE.
And there are many REASONS to this rule, and one of them is that, as an official
and very important document (usually written by governmental or law docs), it is
necessary to avoid the possibility of later changes on the formatted text
(possibly made by obscure and/or illegal/fraudulent intentions).
I am not saying that we need 64 bit long characters... But 32 bit would be
perfect!!! Why not using 32 bits?????
And for those that think that this reason doesn't worth the developer's effort,
my answer is: other word processors can handle big paragraphs since dinosaur's
era, but unfortunately I need to go to M$ Word to handle my big "actas", and I
am VERY SAD for having to do this!!!
I AM STILL A FAN of OpenOffice, but you developers are forcing to me to go
elsewhere, and this makes me MAD!!
Come on, guys!!! A 7-years-old issue that is centered in a simple variable (a
question of 16 bits to 32 bits migration on a specific feature) doesn't deserves
your attention?? This is absurd, unacceptable!! (I know that there is an issue
of "internal limitation", but why this limitations applies only to OO, and no
other word-processor since "stone-era"?
Again, sorry for my bad English. Hope everybody understand my disappointment.
And hope we can write "actas" in Open Office the sooner the better.
Comment 84 Mathias_Bauer 2010-05-20 16:31:20 UTC
Your post is a perfect example why users and developers often talk past each other. 

If this would be only a simple variable change we would have done that years
ago. Unfortunately this is a huge effort as there is a lot of code that assumes
that a paragraph or a text range length fits into 16 bit integer variables. You
have to find and change all of them - that's quite a lot of work to do. So
please refrain from such statements without actually having studied the code
before. 

Of course huge effort alone never is a reason not to do something. But still
there are other things we consider to be more important we can do in the same
time frame. This can't be changed by telling us that this is "only a central
variable to change" though we know that this just isn't true.

Of course everybody is tempted to think that his/her requirements are the most
important ones. But even if you know hundreds or thousands of other users with
the same requirement, there may be hundreds or thousands of other users with
other requirements that compete with yours for the available development resources.

Comment 85 simos.bugzilla 2010-05-20 18:34:33 UTC
piduca: We all want to have this issue fixed. It is very important to be civil
and not appear condescending. I believe this was due to linguistic/cultural
differences. 

mba: This issue looks like one that should be examined by the release team in
order to figure out when to tackle (for which OOo release). Can you please give
a pointer to the mailing list where we can ask to have this issue considered for
a future release?

In addition, is it possible to give some high level instructions for an
ambitious developer who would try to give it a go and modify the code
themselves? This developer should be able to compile OOo; is there a rough guide
on which files need changing?
Comment 86 floris_v 2010-05-20 19:38:41 UTC
Come on folks, piduca took the trouble to download and install this, probably
even spent time to register, only to find that it's no use for him. I can
imagine he's frustrated. I can also imagine the pain of having to hunt down all
references of paragraphs as 16 bit entities, as I recently migrated from Delphi
to Lazarus, only to find that my old 16 bit integers were suddenly interpreted
as 32 bit longint. That gave rise to an incredibly long list of compiler errors
and warnings, and some real bugs with 16 bit data stored on disk interpreted as
32 bits, so I had some overflow there.
Meaning to say: good luck with it!

Maybe somebody should communicate this problem to the marketing department. It'd
be nice if people would be warned about this before they download the package.
Comment 87 Mathias_Bauer 2010-05-20 22:56:37 UTC
The number of files to change is huge.

To give you (or the ambitious developer reading here) an impression: we use an
object of class "String" to store the text of a paragraph. This class only
supports 16 Bit length and indices. [We have another string class that supports
32 Bit, but it is a read-only string class (part of the UNO C++ runtime library).]
We could change the String class to support 32 Bit length and indices. But this
class is one of the most used classes in OOo, so we had to change code in nearly
every library of OOo that does not use the mentioned 32 bit string class
exclusively.

An alternative: use an own String class just for paragraphs in Writer and see
how far it goes. But what should happen if someone selects such paragraph and
pastes it into Calc? And what about the code that Writer and Calc or Draw share
and that also uses Strings (like e.g. the HTML filter)? Sooner or later we will
end up with extending string lengths and indices to 32 Bit all over the place.
So yes, we have to change the String class, there is no alternative.

Thus we have to investigate all code that touches, passes or reads a string and
look for usage of string length and indexing. Can you imagine how much code that
is in an office suite?

Technically, it would be necessary to change the String class, recompile the
code and look for integer cast warnings (as now 32 Bit integers are handed over
to calls or assignments that expect 16 Bit integers) and fix them by moving even
more code to support 32 Bit integer parameters. It's like throwing stones into
the water and follow the waves they create.
And here we can only *hope* that nobody has used integer casts or even C style
casts to convert arbitrary integer variables to unsigned 16 Bit integers, the
type that is used for String length and indices, as then no warning would appear
that can point us to a possible problem. So most probably we also had to scan
the code for integer casts and C style casts to USHORT etc. and investigate this
code.

Can you imagine how much work this would be in several million lines of code? At
least several months - if we don't do anything else. This is comparable to the
switch from 8 Bit characters to 16 Bit characters that we made before we open
sourced the code - so we have some data for comparison. And we had much less
code 10 years ago...

Is this issue worth stopping all other OOo development for several months? Don't
you think that most users will think that we are nuts?

But that's only engineering. As we can't be sure that we have found and
converted all places, we had to make intensive testing with many documents
containing huge paragraphs everywhere and apply all possible functions to them. 

IMHO we can invest our time in other areas with more benefit for the project.
It's not impossible, but very, very much work.

That's the reason why this issue has got the target "later". It's not the last
word about it, but it's the status quo.
Comment 88 mike_hall 2010-05-21 08:20:14 UTC
The tone of the discussion here has changed a little from 'this isn't something
that is needed' to 'this is very difficult'. Thanks for that.

Here are some observations:
- OOO later can generally be interpreted as 'never'
- on the other hand, as the previous change from 8 to 16 bits shows, this is
something that will be needed eventually for a professional product
- it seems intuitive that in high quality code, size should be set by a
parameter, as even 32 bits is unlikely to be the final buffer size

To what extent would it be practical to start preparing for the change, so that
it was much less significant when eventually attempted? For example, would it be
practical to slowly convert the code so that the current 16 bit implementation
is based on a parameter? If it was, the amended class and methods could be
introduced slowly by all developers as they enhance code for other reasons. It
would also allow anyone who was particularly interested in making this change to
start preparation for it within the standard code and it may well be that some
automation of the changes might be possible. It would also allow experimental 32
bit buffer builds and testing. After some time, regrettably probably several
years, the remaining conversion would be much less of a problem, though as mba
clearly explained, it would not be insignificant. If nothing is done, the
problem can only become harder and harder.
Comment 89 floris_v 2010-05-21 08:47:26 UTC
A simple parameter won't work here, what would help a lot however to prevent
this kind of trouble in the future is a strict ban on explicit size typecasts.
Maybe you should introduce a special StringSize type for any typecasts - it's a
pity that you can't ban typecasts entirely, they're a big pain. Then when the
size needs to be doubled again, all you have to do is redefine StringType and
recompile. And then pray that the same thing doesn't pop up in extensions, that
are entirely beyond the control of the developers.
Comment 90 Mathias_Bauer 2010-05-21 12:54:00 UTC
From my POV this was never a "nobody needs that".

Please don't mix the step from 8 bit characters to 16 bit characters with 16 bit
string length vs. 32 bit string length. These are completely different topics
and the necessity to do the first conversion is not related to the other one.

Of course the String class uses a special type for length and indices. But there
is other code that was written in the last 15 years that works with strings and
in many places this explicit type is not used. 

I already thought about a possible way to reduce the effort over time. It's
interesting that I had the same idea as you: if everyone who comes across code
dealing with Strings has a short look and replaces all usage of unsigned 16 bit
variables by the special type "xub_StrLen", we could reduce the effort over
time. Sounds like something I could try to advertize amongst the developers.
Comment 91 Mathias_Bauer 2010-05-21 12:57:50 UTC
Extensions already use 32 bit string length as our UNO API uses the mentioned
other string class. So we will be safe with them until anybody wants to
transport more than 4 GB of data with a single string variable.
Comment 92 hdu@apache.org 2010-05-21 14:52:02 UTC
> replaces all usage of unsigned 16 bit variables by the special type "xub_StrLen"

Yes, this is the important first step. C++ could help with finding all these places if xub_StrLen was a 
class that did not provide an implicit conversion to an integral type anymore when a module has been 
fixed (similar to the gradual changes for warning code, where unchanged modules were marked with 
EXTERNAL_WARNING_NOT_ERRORS).

The concept of strlen is so ambiguous that its use cases would benefit from some clarification. Using 
the helper class suggested above the different use cases could be explicitly differentiated by providing 
methods such as getStrBufferSize(), getUTF16Count(), getUTF32Count()

Finally the concept of strlen should be replaced to an iterator based approach...
Comment 93 michael.ruess 2010-06-01 08:44:15 UTC
*** Issue 111982 has been marked as a duplicate of this issue. ***
Comment 94 eric.savary 2010-07-07 10:51:31 UTC
*** Issue 112997 has been marked as a duplicate of this issue. ***
Comment 95 michael.ruess 2010-07-07 11:33:02 UTC
*** Issue 113000 has been marked as a duplicate of this issue. ***
Comment 96 diegomann 2013-02-01 16:56:01 UTC
Hi!

The bug is soon going to be a TEN year old bug... I hope I would have enough time to fix it myself.

Cheers
DIEGO URRA BAUMGARTNER
Comment 97 martg 2013-10-05 00:24:20 UTC
Um, Happy Very Late Birthday, Bug 17171...