Is that too bad?
well my concern about the deprecated methods is we get into the hairy backwards compat situation...
we already had issues with this with Similarity.
It might be ok to essentially fix Similarity to be the way we want for 4.0 (break it) since its an expert API anyway.
This patch was just a quick stab...
I definitely agree with you about the name though, i prefer Similarity.
should Sim be aware of for which field it was created, so that no need to pass it as parameter in its methods in case this is ever important?
Well honestly I think what you are saying is really needed for the future... but I would prefer to actually delay that until a future patch
Making an optimized TermScorer is becoming more and more complicated, see the one in the bulkpostings branch for example. Because of this,
its extremely tricky to customize the scoring with good performance. I think the score caching etc in term scorer needs to be moved out of TermScorer,
instead the responsibility of calculating the score should reside in Similarity, including any caching it needs to do (which is really impl dependent).
Basically Similarity needs to be responsible for score(), but let TermScorer etc deal with enumerating postings etc.
For example, we now have the stats totalTermFreq/totalCollectionFreq by field for a term, but you can't e.g. take these and make a
Language-modelling based scorer, which you should be able to do right now, except for limitations in our APIs.
So in a future issue I would like to propose a patch to do just this, so that TermScorer, for example is more general. Similarity would need to be able
to 'setup' a query (e.g. things like IDF, building score caches for the query, whatever), and then also score an individual document.
In the flexible scoring prototype this is what we did, but we went even further, where a Similarity is also responsible for 'setting up' a searcher, too.
So that means, its responsible for managing norm byte (in that patch, you only had a byte norms, if you made it in your Similarity yourself).
I think long term that approach is definitely really interesting, but I think we can go ahead and make scoring a lot more flexible in tiny steps
like this without rewriting all of lucene in one enormous patch... and this is safer as we can benchmark performance each step of the way.