This plugin is designed to enhance the
NUTCH-655 patch, by doing two things:
1. Meta Tags that are supplied with your Crawl URLs, during injection, will be propagated throughout the outlinks of those Crawl URLs.
2. When you index your URLs, the meta tags that you specified with your URLs will be indexed alongside those URLs--and can be directly queried, assuming you have done everything else correctly.
The flat-file of URLs you are injecting should, per
NUTCH-655, be tab-delimited in the form of:
http://slashdot.org/ corp_owner=Geeknet will_it_blend=indubitably
http://engadget.com/ corp_owner=Weblogs genre=geeksquad_thriller
To activate this plugin, you must modify two properties in your nutch-sites.xml:
Insert a comma-delimited list of metatags. Using the above example:
<value>corp_owner, will_it_blend, genre</value>
Note that you do not need to include the tag with every URL. However, you must specify each tag if you want it to be propagated and later indexed.