Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1553

Property 'indexer.delete.robots.noindex' not working when using parser-html.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • 1.6
    • 1.13
    • indexer, parser
    • None

    Description

      May be I'm doing something wrong, but it seems to me that NUTCH-1434 patch only works when using tika's parser. When using parser-html, "robots" metatag is only populated if parse-metatags plugin is enabled and it's done with the prefix "metatag.". So parseData.getMeta("robots") returns nothing if not using tika.

      I guess the simplest solution would be to provide a fallback in case parseData.getMeta("robots") is null and then get parseData.getMeta("metatag.robots") in that case.

      Also dependency of this property with parse-metadata plugin when using parse-html would be something interesting to document somewhere... (nutch-default.xml?)

      Thanks!

      Attachments

        1. NUTCH-1553-trunk-1.patch
          0.8 kB
          Fengtan

        Issue Links

          Activity

            People

              snagel Sebastian Nagel
              alfonso.presa Alfonso Presa
              Votes:
              2 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: