Apache OpenOffice (AOO) Bugzilla – Issue 89042
Word count is incorrect with certain special characters in the text (i.e. custom quotes, dashes)
Last modified: 2017-05-20 10:30:46 UTC
Here's the test case: (a ‘salvage function’) counts as three words in MS Word, but four words in the lastest OOo beta. however, if you remove the word 'a', it is counted by both programs as only two words.
Reassigned to SBA.
Confirming issue with DEV300_m24. But it is not the "nested special characters". When I remove the brackets, nothing changes. I will attach a bugdoc with some examples. It looks like the custom quotation marks are the key to OOos mis-count. Adjusting summary. Reassigned to FME. Put khong on cc.
Created attachment 55130 [details] Bugdoc with different quotation marks that irritate word count
Bug still exists in 3.0 (OOO300m9 build 9358). Fairly annoying bug for NaNoWriMo participants. It is definitely the custom quotes; replacing them in a document corrects the word count.
*** Issue 97116 has been marked as a duplicate of this issue. ***
*** Issue 100629 has been marked as a duplicate of this issue. ***
*** Issue 102270 has been marked as a duplicate of this issue. ***
*** Issue 107241 has been marked as a duplicate of this issue. ***
This is a really annoying problem. To replace all the quotes in a file is not really a solution. Especially if it is a 300+ page document.
Reassigned to TL.
*** Issue 99131 has been marked as a duplicate of this issue. ***
Counting non-characters as words must be solved "all at once". Keeping an issue for each miscounting symbol makes not much sense. Mentioning dashes in summary from duplicate of duplicate issue. Exemple: Sed lacinia arcu non diam sodales porttitor. [word count: 7] - Sed lacinia arcu non diam sodales porttitor. [word count: 8] - - - Sed lacinia arcu non diam sodales porttitor. [word count: 10]
*** Issue 108072 has been marked as a duplicate of this issue. ***
Similar problem found in v3.1.1, with the start quote on words in inverted commas. These are the AutoCorrect -> Custom Quotes -> Single quotes -> Start quote -> U+2018 quotes. No problem with the non-AutoCorrected quotes. Example: crimes counts as 1 word; crimes' counts as 1 word; 'crimes' counts as 2 words. It doesn't seem to matter how many words are within the quotes, the first always counts as an extra word.
I have 2.4.1 installed on Ubuntu 8.03.3 and it seems to work correctly. I also have 3.1.1 installed on Ubuntu 9.10 and opening quotes are counted as extra words. For example: "Hello," said Bob. "How are you? Would be eight words. I'm not a very good programmer, but willing to help with test cases and figuring out behavior.
*** Issue 112259 has been marked as a duplicate of this issue. ***
Punctuation characters as well as custom quote characters in combination with a non-breaking space should be handled differently. Please take this behavior in consideration with a feature that has been implemented in DEV300m81 for several French document locales. See http://wiki.services.openoffice.org/wiki/Non_Breaking_Spaces_Before_Punctuation_In_French_%28espaces_ins%C3%A9cables%29
*** Issue 113375 has been marked as a duplicate of this issue. ***
A similar case, not yet mentioned: when "Replace Dashes" is turned on (AutoCorrect -> Options), OOo undercounts. For example, the sentence: Bob--as usual--disagreed. ... is correctly counted as 4 words with "Replace Dashes" turned off, but as only 2 words with "Replace Dashes" on, when the two en dashes are replaced with a single em dash.
Still causing major errors in the Word Count. I've been able to check against other programs, and the discrepancies are at a rate of 1 in 30, compared to differences between other programs of around 1 in 1000 words. Custom quotes and dashes have very obvious effects, and I'm seeing it in Windows and Linux
.
Fixed in CWS tl84. Fixed means: word count for text with typographical quotes (single and double quote) as listed in the attached bugdoc now do behave similar to MS Word 2007 again. This means especially that French « savoir calculer » is still counted as 5, since I was told the main 'feature' of the word count implementation is to give the same result as MS Word.
TL->SBA: Please verify.
Correcting target (from 3.x to 3.4).
->mru: So does your post mean that this bug fix will be rolled into v3.4? I see v3.3 is going through release candidates and I assume the OO developers don't want to complicate the release process with more bug fixes, but how about v3.3.1 then? Thanks!
@scottydm: there is until now no 3.3.1 target and anyway this bug is not that heavy that it should be fixed at a micro release. 3.4 is ok.
Verified in CWS tl84.
*** Issue 117160 has been marked as a duplicate of this issue. ***