Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.4, nutchgora
    • Fix Version/s: 1.5
    • Component/s: parser
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      Parse-metatags plugin

      The parse-metatags plugin consists of a HTMLParserFilter which takes as parameter a list of metatag names with '*' as default value. The values are separated by ';'.

      In order to extract the values of the metatags description and keywords, you must specify in nutch-site.xml

      <property>
        <name>metatags.names</name>
        <value>description;keywords</value>
      </property>
      

      The MetatagIndexer uses the output of the parsing above to create two fields 'keywords' and 'description'. Note that keywords is multivalued.

      The query-basic plugin is used to include these fields in the search e.g. in nutch-site.xml

      <property>
        <name>query.basic.description.boost</name>
        <value>2.0</value>
      </property>
      
      <property>
        <name>query.basic.keywords.boost</name>
        <value>2.0</value>
      </property>
      

      This code has been developed by DigitalPebble Ltd and offered to the community by ANT.com

      1. NUTCH-809-trunk.patch
        15 kB
        Julien Nioche
      2. metatags-plugin+tutorial.zip
        29 kB
        Elisabeth Adler
      3. NUTCH-809_metatags_1.3.patch
        14 kB
        Elisabeth Adler
      4. NUTCH-809.patch
        20 kB
        Julien Nioche

        Issue Links

          Activity

            People

            • Assignee:
              Julien Nioche
              Reporter:
              Julien Nioche
            • Votes:
              2 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development