Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1553

Property 'indexer.delete.robots.noindex' not working when using parser-html.

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 1.6
    • Fix Version/s: 1.13
    • Component/s: indexer, parser
    • Labels:
      None

      Description

      May be I'm doing something wrong, but it seems to me that NUTCH-1434 patch only works when using tika's parser. When using parser-html, "robots" metatag is only populated if parse-metatags plugin is enabled and it's done with the prefix "metatag.". So parseData.getMeta("robots") returns nothing if not using tika.

      I guess the simplest solution would be to provide a fallback in case parseData.getMeta("robots") is null and then get parseData.getMeta("metatag.robots") in that case.

      Also dependency of this property with parse-metadata plugin when using parse-html would be something interesting to document somewhere... (nutch-default.xml?)

      Thanks!

        Attachments

        1. NUTCH-1553-trunk-1.patch
          0.8 kB
          Fengtan

          Issue Links

            Activity

              People

              • Assignee:
                snagel Sebastian Nagel
                Reporter:
                alfonso.presa Alfonso Presa
              • Votes:
                2 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: