Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-2586

Add a fallback mechanism for missing meta tags

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.15
    • 1.21
    • metadata, plugin
    • None

    Description

      While using nutch, we faced the following issue: some web pages miss a "description" meta tag, but include an "og:description" meta (using the open graph protocol).

      Here are two examples:

      It would be nice to have a configurable list of fallback meta tags to use when the main meta tag is absent. Something that would allow us to specify, in the configuration, "when the 'description' meta is missing, use 'og:description', when 'title' is missing, use 'og:title', etc..." .

      Attachments

        Activity

          People

            Unassigned Unassigned
            gbouchar Gerard Bouchar
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: