Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2546

parse-(metatags|html) plugin - "meta property" not extracted only "meta name"

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 1.15
    • Fix Version/s: 1.19
    • Component/s: parser
    • Labels:
      None

      Description

      The parse-(metatags|html) plugin "extracts" meta tags like "<meta property=", but tags like "<meta name=" are not processed.

      HTML e.g.:

      <meta property="og:title" content="Content in this property..."/> - not extracted
      <meta name="description" content="Content in this meta..."/> - OK

       

      When using parse-tika plugin for parsing, meta property fields are processed.

      <name>plugin.includes</name>

      <value>parse-(html|tika|metatags)...</value>

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              wpsadm Irinel
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: