Uploaded image for project: 'ManifoldCF'
  1. ManifoldCF
  2. CONNECTORS-1392

Add option for Web connector to ignore robots instructions in meta tags

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: ManifoldCF 2.7
    • Component/s: Web connector
    • Labels:
      None

      Description

      The Web connectors already allows to ignore robots.txt by option.

      With this ticket, another option is added, to allow the connector to ignore robots instructions in <meta name="robots ... tags.

      Proposal (to be discussed)

      Add a new option list "Page level robots instructions" to the "Robots" Tab. List entries:

      1. Obey meta robots tags (the default)
      2. Don't took at meta robots tags

      The end user doc needs to be updated.

      Google ressources on robot instructions in HTML pages:
      [0] https://support.google.com/webmasters/answer/79812?hl=en&ctx=cb&src=cb&cbid=tnnsjq5jcodt&cbrank=4
      [1] https://support.google.com/webmasters/answer/96569?hl=en&ctx=cb&src=cb&cbid=-5rmggrfsp2rq&cbrank=3
      [2] https://developers.google.com/webmasters/control-crawl-index/docs/robots_meta_tag?csw=1

      Thread on the mailing list
      [3] https://www.mail-archive.com/user@manifoldcf.apache.org/msg03258.html

        Attachments

        1. CONNECTORS-1392.patch
          9 kB
          Markus Schuch
        2. CONNECTORS-1392-2.patch
          11 kB
          Markus Schuch
        3. CONNECTORS-1392-3.patch
          18 kB
          Markus Schuch
        4. web-configure-robots.PNG
          40 kB
          Markus Schuch

          Activity

            People

            • Assignee:
              schuch Markus Schuch
              Reporter:
              schuch Markus Schuch
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: