Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-809

Parse-metatags plugin

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.4, nutchgora
    • 1.5
    • parser
    • None
    • Patch Available

    Description

      Parse-metatags plugin

      The parse-metatags plugin consists of a HTMLParserFilter which takes as parameter a list of metatag names with '*' as default value. The values are separated by ';'.

      In order to extract the values of the metatags description and keywords, you must specify in nutch-site.xml

      <property>
        <name>metatags.names</name>
        <value>description;keywords</value>
      </property>
      

      The MetatagIndexer uses the output of the parsing above to create two fields 'keywords' and 'description'. Note that keywords is multivalued.

      The query-basic plugin is used to include these fields in the search e.g. in nutch-site.xml

      <property>
        <name>query.basic.description.boost</name>
        <value>2.0</value>
      </property>
      
      <property>
        <name>query.basic.keywords.boost</name>
        <value>2.0</value>
      </property>
      

      This code has been developed by DigitalPebble Ltd and offered to the community by ANT.com

      Attachments

        1. metatags-plugin+tutorial.zip
          29 kB
          Elisabeth Adler
        2. NUTCH-809_metatags_1.3.patch
          14 kB
          Elisabeth Adler
        3. NUTCH-809.patch
          20 kB
          Julien Nioche
        4. NUTCH-809-trunk.patch
          15 kB
          Julien Nioche

        Issue Links

          Activity

            People

              jnioche Julien Nioche
              jnioche Julien Nioche
              Votes:
              2 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: