Issue 101224 - common autocorrect replacement table for certain languages subgroups (i.e. en-US, en-GB)
Summary: common autocorrect replacement table for certain languages subgroups (i.e. en...
Status: ACCEPTED
Alias: None
Product: General
Classification: Code
Component: code (show other issues)
Version: OOo 2.0
Hardware: All All
: P3 Trivial with 10 votes (vote)
Target Milestone: ---
Assignee: AOO issues mailing list
QA Contact:
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-04-20 14:39 UTC by tommy27
Modified: 2017-05-20 10:47 UTC (History)
5 users (show)

See Also:
Issue Type: ENHANCEMENT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description tommy27 2009-04-20 14:39:30 UTC
I had this idea for a new feature request while discussing about issue 87672

i was thinking about having a “common autocorrect replacement table”  that 
works over same language subgroups... 

OOo has indeed separate replacement tables for UK English, USA English, AUS 
English... (acor_en-US.dat , acor_en-GB.dat etc. etc.) 
it has also “Italian (Italy)” and “Italian (Swiss)”...  (acor_it-IT.dat,  
acor_it-CH.dat)

however actually they are not mutual... 
if you enter an entry in the UK database (i.e. computre --> computer)
it won't correct that mistake if you are writing in a US English document.
If you want to have that autocorrection you have to create an entry for it in 
each engligh subgroups.

Do u think is possible to create a new common 
replacement table for all of the “same language subgroups”?


Actually there is the “universal replacement table” acor_.dat whose entries are 
applied in any language you are writing in.

it would be great to have something similar restricted to certain language 
groups.
something like: 
acor_en-ALL.dat working on both UK, US, AUS etc. ect. english variants 
acor_it-ALL.dat   working both on italian and swiss language
Comment 1 stefan.baltzer 2009-04-20 16:08:02 UTC
At times these "language variants" are treated like different languages. Users
that rely on different "English writing sources" will run into differnt variants
(UK, US, ZA, CAN, ..) without actually noticing it. And collect data into
different AutoCorrr tables.

Note: Using one AutoCorr table for several languages is similar to the need of
re-use of other linguistic features (spellcheck, hyphenation, thesaurus) to use
for other "User-defined English variants" that do not have the support yet.
 
At least, for updating a certain user-defined dictionary ("adding up words to
the personal spellchecker setup"), it can be set to "ONE language" or "ALL
languages".

Reassigned to OS, add TL on C/C.
Comment 2 tommy27 2009-04-21 07:37:42 UTC
i think that you should think a way to make OOo handle in a shared way the 
autocorrect feauture among "same language" variants
Comment 3 tommy27 2009-05-09 08:30:49 UTC
I'm thinking of an option to enable/disable sharing entries between "same 
language subtypes".

Option enabled: 
acor_en-US.dat entries will apply to any english written document (either if 
I'm writing with en-GB, en-ZA, en-CAN, en-whatever)

Option disabled:
acor_en-US.dat entries will apply only to en-US written documents
acor_en-GB.dat entries will apply only to en-GB written documents

user could then decide if they prefer to have separate independent 
autocorrection replacement table of mutual shared entries.

This would also allow user to increase the available number of autocorrect 
entries per language (as you know there's a 65535 entry limit for each dat file 
that cause crash and data loss: see Issue 87672).

You could indeed have 65K entries in acor_en-US.dat, another 65K in the  
acor_en-GB.dat etc. etc.

do you think that such an option could be integrated in OOo?
Comment 4 tommy27 2009-05-09 19:03:43 UTC
Again on this issue...
let's take the example of italian language.
Two versions exist for it: italina (Italy) and italian (Switzerland).

Each one has it replacement table file acor_it-IT.dat and acor_it-CH.dat

why can't OOo have a third replacemtne table called acor_it.dat which woirks 
either on italian-Italy written documents or italian-Switzerland documents?

I tried to make a copy of acor_it-CH.dat file and renamed it to acor_it.dat 

unfortunately when i open the replacement table and scroll among al the 
available languages that file is not recognized by OOo.


Comment 5 Oliver Specht 2009-05-10 16:56:05 UTC
->tommy27: I know you want to get more than 64K entries. But to allow the
addition of different subtypes to reach that goal is the wrong way. I don't know
the difference between it-IT and it-CH. But the example with en-US and en-GB is
actually a bad idea. Different spelling and sometimes the use of different words
is _the_ reason to have such different language subtypes. Merging them will
break that. 
Comment 6 tommy27 2009-05-10 18:10:22 UTC
-> os:
thank you for your response. 
I really appreciate that you are taking the time to answer me.

First of all there' s actually no difference between Italian spoken in Italy 
and Italian spoken in Switzerland so having 2 separate acor.dat files looks 
redundant to me.

You are right however to point out that this is not the situation of GB and US 
english which have some different spelling cases (i.e. color / colour). 
I agree with you that this subtype policy must be kept because of the minority 
of words that behave like this.

For example i could set a:
- “colour -> color”  entry in the acor-en_US.dat file and a
- “color -> colour” entry in the acor-en_GB.dat file

So forget about my previous request about “sharing of entruies” in those 
separate en-GB and en-US .dat files. Each .dat files should have it's own 
autocorrect list.

there's however the vast majority of words that have exactly the same 
spelling... let's take an example: “yellow” which is the same in England, USA, 
South Africa, Australia, Canada etc. etc.

if you come with a typing error like “yrllow” you should set an autocorrect 
entry in each of the localized english .dat files... it would be too time 
consuming...

It would be much user friendly and time saving to have a “non localized” acor-
en.dat file whose entries are shared by all english subtypes. 

You see? I'm not asking to merge the GB and US acor.dat files anymore... 
I'm asking to add an additional “non localized” database.
This would have the advantage to handle autocorrection of the vast majority of 
common spelling english words regardless of the regional language subtype of 
documents. 

Moreover as a side effect it would give the user additional room for 
autocorrections since another 65K could be added in that .dat file, and I don't 
think it would be a wrong way to operate. i find it a nice workaround.

I know OOo has a global acor_.dat database whose entries work in any language 
from arabic to swahili...  i find useful for typing errors about numbers like 
“year 2oo7 -> year 2007”.

so I'm wondering that if you were able to set a .dat file for all languages you 
could also set a .dat file for all english subtypes (and others for all other 
language for whom many subtypes exist like italian, french, german, spanish 
etc.)
Comment 7 tommy27 2009-05-14 14:29:37 UTC
any feedback about my last post?
Comment 8 Oliver Specht 2009-05-14 15:17:45 UTC
Well, it is possible. 
Comment 9 tommy27 2009-05-14 20:16:56 UTC
great!!! i love your last reply!!!

if it can be done, i think it should be done for english, italian, french, 
german, spanish and all those languages that come in different subtype.

i think it would be a great new feature for OOo.
is it this thing hard to implement?
Comment 10 Oliver Specht 2009-05-15 07:23:10 UTC
I'd estimate a week of work.
Comment 11 tommy27 2009-05-15 07:57:58 UTC
great!!! great!!! great!!!
i appreciate your interest in this issue.
i will stay tuned waiting news about it.
Comment 12 tommy27 2009-05-22 15:48:34 UTC
@os

any news about your progress fixing this issue?

Comment 13 tommy27 2009-05-27 13:13:27 UTC
@os
really no news about this issue?
i'm just curious to see if any progress of test has been done in the last 10 
days. thanks.
Comment 14 Oliver Specht 2009-05-27 13:24:33 UTC
->tommy27: There is no sense in regularly asking for progress. There are lots of
issues around that are on 3.2 target and are not in progress right now. 
Comment 15 kpalagin 2009-05-27 13:58:36 UTC
tommy27,
enhancments usually do not progress over couple days. 
Please do not make devs angry by being so pushy. I would even say that every 
two month is too often.
Instead get people vote for this (and other) issue.
Comment 16 tommy27 2009-05-27 15:43:27 UTC
Sorry guys,
it was not my intention to be pushy...
regarding my requests to os i made a misunderstanding about his previous post.

He said he would estimate a week of work to fix this issue and i thought he 
meant a week from now... that's why i asked him news a week later.
Now i realize that he only said he needed a week but he did not say he would 
start doing it immediately...

I didn't notice that the issue status was still NEW and not STARTED.... my 
fault!!!
So, please accept my excuses. 
Comment 17 Oliver Specht 2009-05-28 06:13:01 UTC
->tommy27: Just to make sure: 'Started' doesn't mean that the work has been
started. It means 'accepted'.
BTW: state now changed to started
Comment 18 tommy27 2010-03-06 09:32:50 UTC
hi  os, haw are you?
almost an year passed from my last post.

i'm writing here again since 

you know i have already 65k italian autocorrections stored in the acor_.dat 
file (all-languages autocorrections) and then i started filling the acor_it-
IT.dat (italian language for ITALY).

actually i'm very close to the 65k limit of that acor_it-IT.dat as well (last 
count was 62673 entries).

this means that i'm very close again to the crash limit.

for this reason it would be a bless for me to have the acor_it-
non_localized.dat file we talked about in previous posts.

this would give me several advantages:

1- it would allow me to enter another 65k autocorrect entries... that means 
another 2 years of bad typing to be autocorrected by OOo

2- it would allow me to have a single acor.dat for english regardless of 
sublocalization (en_US, en_GB, en_AU) in which i could store all the typing 
errors that are common to all english variants (i.e. yrllow --> yellow) and use 
the localized .dat files only for those words with varable spelling (i.e. color 
in american english --> colour in british english)


do you think this STARTED issue could be FIXED in OOo 3.3?

thanks again for your patience and help.

 



Comment 19 tommy27 2010-03-10 21:05:50 UTC
I wanna thank all the people who recently voted for this issue
Comment 20 al_idrisi 2010-04-04 17:38:33 UTC
If the issue is still under development: It'd be very useful to open this
feature *not* only to "language variants."

I use many autocorrect entries across languages, for example names or
abbreviations of journal titles, etc. (If you cite the author, you'll use the
name & journal/book in whichever language you use.) The same goes for many (more
obscure) geographical locations, company names, titles of music albums, etc.--if
we try, we'll come up with a lot more examples--and even for substituting
special characters (e.g., I always like to have an autocorrect function for
Greek letters without having to add a Greek keyboard) or stuff like quickly
adding somewhere a line with dashes.

Besides, texts get sometimes written by mistake with the wrong language setting;
unless you "check spelling/grammar as you type," you will probably never notice
until you make a final spell check, but in the meantime you can easily add a
word to the wrong language file if you don't pay attention.

(And I'd think that these issues will increase in the next few years, 1.)
because OO will be used by more international users, and 2.) more and more users
will use more than one language. So I wonder if it's easier to introduce a more
flexible solution now rather than changing it again soon.)

So I guess it'd be optimal then if a user could choose where to add a new entry:
- to the current language
- to all languages of this variant (e.g., "all English")
- to all languages
- or to languages he selects (drop-down menu which would allow clicking on
multiple dictionaries)

I realize that this is more work, so I'm really just suggesting this solution...
even though, as I said before, I suspect that it might be work done "for the
future" :o)

Either way, thanks for taking care of any of this!
Comment 21 tommy27 2010-04-06 07:35:37 UTC
@al_idrisi

some of the ideas that you brought here look very interesting, expecially the 
ability to assign an autocorrect entry to multiple databases.

this will probably need another issue to be opened as a new feature request.

in the meantime I kindly ask you to add your vote to the current issue.
we already collected 10 votes. the more votes the issue get th more chance it 
will be fixed.

hopefully this issue could be fixed in OOo 3.3

thanks. 
Comment 22 az77 2010-07-02 10:42:00 UTC
some thoughts on this issue
The idea of being given the choice of
current-local/all-current-language/selected-locales/all-locales is very good.
This seems to me to be a much better solution than adding an
all-locales-for-a-language file, since erreurs in applying a change to all
locales of a language, instead of some -- which are inevitable -- could be much
more easily corrected.

But I think that raising the 64k size limit would also be very useful.  64k is
tiny by today's standards.  I imagine that raising the size limit could be done
more easily.  (going from 16-bit to 32-bit index ?)
Comment 23 az77 2010-07-02 10:43:28 UTC
some thoughts on this issue
The idea of being given the choice of
current-local/all-current-language/selected-locales/all-locales is very good.
This seems to me to be a much better solution than adding an
all-locales-for-a-language file, since erreurs in applying a change to all
locales of a language, instead of some -- which are inevitable -- could be much
more easily corrected.

But I think that raising the 64k size limit would also be very useful.  64k is
tiny by today's standards.  I imagine that raising the size limit could be done
more easily.  (going from 16-bit to 32-bit index ?)
Comment 24 tommy27 2010-07-03 06:41:36 UTC
@az77
If you want the devs to raise the autocorrect limit, please vote issue 87672

as a side effect they should figure out how to speed up autocorrect replacement 
table which is already very slow to load when multiple autocorrect entries are 
presente (see issue 101726 )
Comment 25 tommy27 2011-03-20 09:37:30 UTC
just a reminder for this issue.

I know it was not scheduled for OOo 3.4 but I hope something could be done for OOo 3.5
Comment 26 tommy27 2012-05-14 20:57:24 UTC
I hope that the new Apache OpenOffice era will finally fix this issue
Comment 27 Marcus 2017-05-20 10:47:41 UTC
Reset assigne to the default "issues@openoffice.apache.org".