Details
Description
Parse-metatags plugin
The parse-metatags plugin consists of a HTMLParserFilter which takes as parameter a list of metatag names with '*' as default value. The values are separated by ';'.
In order to extract the values of the metatags description and keywords, you must specify in nutch-site.xml
<property> <name>metatags.names</name> <value>description;keywords</value> </property>
The MetatagIndexer uses the output of the parsing above to create two fields 'keywords' and 'description'. Note that keywords is multivalued.
The query-basic plugin is used to include these fields in the search e.g. in nutch-site.xml
<property> <name>query.basic.description.boost</name> <value>2.0</value> </property> <property> <name>query.basic.keywords.boost</name> <value>2.0</value> </property>
This code has been developed by DigitalPebble Ltd and offered to the community by ANT.com
Attachments
Issue Links
- is related to
-
NUTCH-422 index-extra plugin creates additional fields in the index, based on configurable logic
- Closed
-
NUTCH-1005 Parse headings plugin
- Closed
- relates to
-
NUTCH-422 index-extra plugin creates additional fields in the index, based on configurable logic
- Closed
-
NUTCH-1005 Parse headings plugin
- Closed
-
NUTCH-1406 index-metadata plugin: conversion to Solr date format
- Open