Description
the patch attached allows to inject metadata into the crawlDB. The input file has to contain fields separated by tabs, with the URL being on the first column. The metadata names and values are separated by '='. A input line might look like this:
http://www.myurl.com \t categ=value1 \t categ2=value2
This functionality can be useful to store external knowledge and index it with a custom plugin
Attachments
Attachments
Issue Links
- relates to
-
NUTCH-628 Host database to keep track of host-level information
- Closed