|
[
Permlink
| « Hide
]
Karl Wettin added a comment - 03/Feb/07 02:19 AM
All of the old comments was obsolete, so I re-initialized the whole issue.
NgramPhraseSuggester is now decoupled from the adaptive layer, but I would like to refactor it even more so it is easy to replace the SpellChecker with any other single token suggester.
Patch in this issue have no dependencies to anything out of the ordinary.
However, a large refactor and well documented version dependent to This feature looks interesting, but why should it depend on
Nicolas Lalevée [03/Mar/07 01:04 PM]
> This feature looks interesting, but why should it depend on It use the Index (notification, unison index factory methods, et c.) and IndexFacade (cache, fresh reader/searcher et c.) available in that patch. And by doing that, it also enables me to use InstantiatedIndex for the a priori corpus and ngram index to speed up the response time even more. As the phrase-suggestion layer on top of contrib/spell in this patch was noted in a bunch of forums the last weeks, I've removed the 550-dependency and brought it up to date with the trunk.
Second level suggesting (ngram token, phrase) can run stand alone. See TestTokenPhraseSuggester. However, I recommend the adaptive dictonary as it will act as a cache on top of second level suggestions. (See docs.) Output from using adaptive layer only, i.e. suggestions based on how users previously behaved. About half a million user queries analyed to build the dictionary (takes 30 seconds to build on my dual core): 3ms pirates ofthe caribbean -> pirates of the caribbean Using the phrase ngram token suggestion using token matrices checked against an apriori index. A lot of queries required for one suggestion. Instantiated index as apriori saves plenty of millis. This is expensive stuff, but works pretty good. 72ms the pilates -> the pirates (That 0ms is becase previous was cached. One does not have to use this cache.) RAMDirectory vs. InstantiatedIndex as apriori index: the latter is 5 to 25 times faster (leave first out).
RAMDirectory: InstantiatedIndex: New in this patch:
Next patch will probably focus on:
In this patch:
SuggestionFacade facade = new SuggestionFacade(new File("data")); facade.getDictionary().getPrioritesBySecondLevelSuggester().putAll(facade.secondLevelSuggestionFactory()); ... QuerySession session = facade.getQuerySessionManager().sessionFactory(); ... String query = "heros of mght and magik"; Hits hits = searcher.search(queryFactory(query)); String suggested = facade.didYouMean(query); session.query(query, hits.length(), suggested); ... facade.getQuerySessionManager().getSessionsByID().put(session); ... facade.trainExpiredSessions(); ... facade.close();
If anyone have some rather large query logs with session id, time stamp and preferably click through data that I can test on this, that would be great. It really needs to be adjusted to more than one.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||