Details
-
Improvement
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
None
-
None
-
None
-
New
Description
Some data can lead to worst-case ~4x RAM usage due to this optimization. Several ideas were suggested to combat this on the mailing list:
I think we can improve thesituation here by tracking, per-FST instance, the size increase we're seeing while building (or perhaps do a preliminary pass before building) in order to decide whether to apply the encoding.
we could also make the encoding a bit more efficient. For instance I noticed that arc metadata is pretty large in some cases (in the 10-20 bytes) which make gaps very costly. Associating each label with a dense id and having an intermediate lookup, ie. lookup label
> id and then id>arc offset instead of doing label->arc directly could save a lot of space in some cases? Also it seems that we are repeating the label in the arc metadata when array-with-gaps is used, even though it shouldn't be necessary since the label is implicit from the address?
Attachments
Attachments
Issue Links
- relates to
-
LUCENE-9286 FST arc.copyOf clones BitTables and this can lead to excessive memory use
- Closed
- links to