Type: New Feature
Resolution: Won't Fix
Affects Version/s: 0.7.2
Fix Version/s: None
Built and tested on Linux so far.
These plugins allow you to define meta tags in you're nutch-site file that you want to include in parseing, indexing and searching. The query plugin must replace query-basic. The format for adding query terms to nutch-site.xml is:
<description>This is a comma seperated list of meta tag names that will
be parsed, indexed and searched against when parse-meta, index-meta and
query-meta are used.</description>
<description>Comma seperated list of boost values when searching using
query-meta. The order of the values should match the order of meta.names.
Meta tags found are assumed to have either a single value or be a comma seperated list of values. The values found are added to the index as lucene keywords (i.e. meta name=keywords values="First Thing, Second Thing" would result in two keyword fields named "keywords". The first would countain "First Thing" and the second would contain "Second Thing").
I had to replace the query-basic plugin in order to allow matches in the meta fields to return hits even if there were no matches in any of the default fields. The query-basic field only returns hits when every search term is found in at least one default field. I needed hits returned if matches were found in at least one field for every term, and/or the entire search phrase appeared in a meta index field.
One known bug is that common terms are not getting stripped out of the fields' values before they get indexed, so "The Next Big Thing" could not be matched because the query engine will strip out "the" from all queries. I intend to fix this by stipping out common terms from meta fields before indexing them.
Another issue is that searching for "Next Big Thing" would not match meta index values for "Next", "Big" or "Thing". You can consider that a bug or a feature depending on how you look at it.
These plugins were written for and only work on the 0.7.2 branch.
I'm going to attache a tarball of the source of these three plugins after I create the issue. To use the plugins, you'll need to untar them in your src/plugins directory and add them to the ant build.xml directive (and of course add them in your nutch-site.xml file). If these end up getting added to the project, I'll write up documentation on the wiki.