[NUTCH-1478] Parse-metatags and index-metadata plugin for Nutch 2.x series - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: 2.1
Fix Version/s: 2.3
Component/s: parser
Labels:
None

Patch Info:

Patch Available

Description

I have ported parse-metatags and index-metadata plugin to Nutch 2.x series. This will take multiple values of same tag and index in Solr as i patched before (https://issues.apache.org/jira/browse/NUTCH-1467).

The usage is same as described here (http://wiki.apache.org/nutch/IndexMetatags) but one change is that there is no need to give 'metatag' keyword before metatag names. For example my configuration looks like this (https://github.com/salvager/NutchDev/blob/master/runtime/local/conf/nutch-site.xml)

This is only the first version and does not include the junit test. I will update the new version soon.

This will parse the tags and index the tags in Solr. Make sure you create the fields in 'index.parse.md' in nutch-site.xml in schema.xml in Solr.

Please let me know if you have any suggestions

This is supported by DLA (Digital Library and Archives) of Virginia Tech.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

metadata_parseChecker_sites.png
31/Dec/12 17:33
280 kB
Kiran
Nutch1478.patch
18/Oct/12 21:11
8 kB
Kiran
Nutch1478.zip
19/Oct/12 15:32
13 kB
Kiran
NUTCH-1478-parse-v2.patch
15/Jan/14 15:38
17 kB
Tien Nguyen Manh
NUTCH-1478v3.patch
17/Jan/14 12:01
30 kB
Lewis John McGibbney
NUTCH-1478v4.patch
28/Jan/14 13:00
29 kB
Yasin Kılınç
NUTCH-1478v5.1.patch
10/Mar/14 18:41
6 kB
Vangelis Karvounis
NUTCH-1478v5.patch
28/Feb/14 09:59
36 kB
Talat Uyarer
NUTCH-1478v6.patch
12/Mar/14 13:30
34 kB
Talat Uyarer

Activity

People

Assignee:: Unassigned

Reporter:: Kiran

Votes:: 6 Vote for this issue

Watchers:: 12 Start watching this issue

Dates

Created:: 18/Oct/12 21:09

Updated:: 20/Mar/14 01:19

Resolved:: 13/Mar/14 12:51