Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2546

parse-(metatags|html) plugin - "meta property" not extracted only "meta name"

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.15
    • 1.21
    • parser
    • None

    Description

      The parse-(metatags|html) plugin "extracts" meta tags like "<meta property=", but tags like "<meta name=" are not processed.

      HTML e.g.:

      <meta property="og:title" content="Content in this property..."/> - not extracted
      <meta name="description" content="Content in this meta..."/> - OK

       

      When using parse-tika plugin for parsing, meta property fields are processed.

      <name>plugin.includes</name>

      <value>parse-(html|tika|metatags)...</value>

      Attachments

        Activity

          People

            Unassigned Unassigned
            wpsadm Irinel
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: