Details

Type: Improvement

Status: Closed

Priority: Trivial

Resolution: Fixed

Affects Version/s: None

Fix Version/s: 4.0ALPHA

Component/s: core/index

Labels:None
Description
In the current implementation FST states are expanded into a binary search on labels (from a vector of transitions) when the child count of a state exceeds a given predefined threshold (NUM_ARCS_FIXED_ARRAY). This threshold affects automaton size and traversal speed (as it turns out when benchmarked). For some degenerate data sets, closetotheroot nodes could have a small number of children (below the threshold) and yet be traversed on every single seek.
A fix of this is to introduce two control thresholds:
EXPAND state if (distancefromroot <= MIN_DISTANCE  childrencount >= NUM_ARCS_FIXED_ARRAY)
My plan is to create a data set that will prove this first and then to implement the workaround above.
I can't assign issues to myself, is that all right?