Details

    • Type: New Feature
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.4, nutchgora
    • Fix Version/s: 1.5
    • Component/s: parser
    • Labels:
      None
    • Patch Info:
      Patch Available

      Description

      Parse-metatags plugin

      The parse-metatags plugin consists of a HTMLParserFilter which takes as parameter a list of metatag names with '*' as default value. The values are separated by ';'.

      In order to extract the values of the metatags description and keywords, you must specify in nutch-site.xml

      <property>
        <name>metatags.names</name>
        <value>description;keywords</value>
      </property>
      

      The MetatagIndexer uses the output of the parsing above to create two fields 'keywords' and 'description'. Note that keywords is multivalued.

      The query-basic plugin is used to include these fields in the search e.g. in nutch-site.xml

      <property>
        <name>query.basic.description.boost</name>
        <value>2.0</value>
      </property>
      
      <property>
        <name>query.basic.keywords.boost</name>
        <value>2.0</value>
      </property>
      

      This code has been developed by DigitalPebble Ltd and offered to the community by ANT.com

        Attachments

        1. NUTCH-809-trunk.patch
          15 kB
          Julien Nioche
        2. NUTCH-809.patch
          20 kB
          Julien Nioche
        3. NUTCH-809_metatags_1.3.patch
          14 kB
          Elisabeth Adler
        4. metatags-plugin+tutorial.zip
          29 kB
          Elisabeth Adler

          Issue Links

            Activity

              People

              • Assignee:
                jnioche Julien Nioche
                Reporter:
                jnioche Julien Nioche
              • Votes:
                2 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: