I think we should enable setDiscountOverlaps in DefaultSimilarity by default.
If you are using synonyms or commongrams or a number of other 0-posInc-term-injecting methods, these currently screw up your length normalization.
These terms have a position increment of zero, so they shouldnt count towards the length of the document.
I've done relevance tests with persian showing the difference is significant, and i think its a big trap to anyone using synonyms, etc: your relevance can actually get worse if you don't flip this boolean flag.
|Transition||Time In Source Status||Execution Times||Last Executer||Last Execution Date|
|2d 18h 55m||1||Robert Muir||28/Feb/10 09:18|
|395d 6h 31m||1||Grant Ingersoll||30/Mar/11 15:50|
|Status||Resolved [ 5 ]||Closed [ 6 ]|
|Workflow||Default workflow, editable Closed status [ 12564221 ]||jira [ 12583991 ]|
|Workflow||jira [ 12500132 ]||Default workflow, editable Closed status [ 12564221 ]|
|Fix Version/s||3.1 [ 12314822 ]|
|Status||Open [ 1 ]||Resolved [ 5 ]|
|Resolution||Fixed [ 1 ]|
|Assignee||Robert Muir [ rcmuir ]|
|Fix Version/s||3.1 [ 12314025 ]|