Nutch
  1. Nutch
  2. NUTCH-1478

Parse-metatags and index-metadata plugin for Nutch 2.x series

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.1
    • Fix Version/s: 2.3
    • Component/s: parser
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      I have ported parse-metatags and index-metadata plugin to Nutch 2.x series. This will take multiple values of same tag and index in Solr as i patched before (https://issues.apache.org/jira/browse/NUTCH-1467).

      The usage is same as described here (http://wiki.apache.org/nutch/IndexMetatags) but one change is that there is no need to give 'metatag' keyword before metatag names. For example my configuration looks like this (https://github.com/salvager/NutchDev/blob/master/runtime/local/conf/nutch-site.xml)

      This is only the first version and does not include the junit test. I will update the new version soon.

      This will parse the tags and index the tags in Solr. Make sure you create the fields in 'index.parse.md' in nutch-site.xml in schema.xml in Solr.

      Please let me know if you have any suggestions

      This is supported by DLA (Digital Library and Archives) of Virginia Tech.

      1. NUTCH-1478v6.patch
        34 kB
        Talat UYARER
      2. NUTCH-1478v5.1.patch
        6 kB
        Vangelis Karvounis
      3. NUTCH-1478v5.patch
        36 kB
        Talat UYARER
      4. NUTCH-1478v4.patch
        29 kB
        Yasin Kılınç
      5. NUTCH-1478v3.patch
        30 kB
        Lewis John McGibbney
      6. NUTCH-1478-parse-v2.patch
        17 kB
        Tien Nguyen Manh
      7. metadata_parseChecker_sites.png
        280 kB
        kiran
      8. Nutch1478.zip
        13 kB
        kiran
      9. Nutch1478.patch
        8 kB
        kiran

        Activity

        No work has yet been logged on this issue.

          People

          • Assignee:
            Unassigned
            Reporter:
            kiran
          • Votes:
            6 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development