Through some experimentation with the BM25FQuery on long documents, I've discovered that there is a bug that doesn't mask the encoded norm's long value during scoring. For long documents (or long fields) this may cause ArrayIndexOutOfBoundsExceptions.
The line where I suspect the bug is being exposed is here
Here is a similar use in BM25Similarity with the masking
My experimentation shows that to expose this bug, there must be a match for a token in more than one field (which is what BM25FQuery is for). In addition one of the fields must be >= 32792 tokens long.
I've provided tests in the pull request to demonstrate this.
Created a PR here: https://github.com/apache/lucene-solr/pull/2138
- links to