Infrastructure
  1. Infrastructure
  2. INFRA-1106

robots.txt on wiki.apache.org blocks Google (again)

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Fix Version/s: Initial Clearing
    • Component/s: MoinMoin
    • Labels:
      None

      Description

      A couple of years ago, https://issues.apache.org/jira/browse/INFRA-361 was marked "fixed", allowing Google to spider the wiki sites.

      However, while searching for some Spamassassin documnetation today, I noticed that virtually no wiki hits were appearing -- especially odd since our documentation and FAQ is primarily wiki-based.

      A check of http://wiki.apache.org/robots.txt reveals the cause:

      User-agent: *
      Disallow: /

      Can this be fixed? We rely on our user population being able to find documentation and FAQs via google. Should we move this from ASF infrastructure if there are persistent load problems?

        Activity

        Hide
        Justin Mason added a comment -
        can someone just add a comment to mail some list, if changing the file in future? A little warning would go a long
        way.
        Show
        Justin Mason added a comment - can someone just add a comment to mail some list, if changing the file in future? A little warning would go a long way.
        Hide
        Henri Yandell added a comment -
        Looks like this issue can be closed.
        Show
        Henri Yandell added a comment - Looks like this issue can be closed.
        Hide
        Paul Querna added a comment -
        I've changed the robots.txt to allow everything. I don't really know that we can make any kind of guarantee for the wiki service. If you want to ensure your documentation is always available, my recommendation would be to store it in SVN as static files, and then put it on the standard webserver space, were we can keep copies in multiple live servers... (and so its cheap to serve... Wiki CGIs are... expensive).
        Show
        Paul Querna added a comment - I've changed the robots.txt to allow everything. I don't really know that we can make any kind of guarantee for the wiki service. If you want to ensure your documentation is always available, my recommendation would be to store it in SVN as static files, and then put it on the standard webserver space, were we can keep copies in multiple live servers... (and so its cheap to serve... Wiki CGIs are... expensive).
        Hide
        Justin Mason added a comment -
        If that'd be possible, it'd be great; however the other half
        of the issue is the regression.

        It's essential for us that our users be able to find our documentation -- the wiki is the canonical location we use for that. If it's not googleable, we'll have to move it.

        Can we get a guarantee that it'll be accessible to google in future?
        Show
        Justin Mason added a comment - If that'd be possible, it'd be great; however the other half of the issue is the regression. It's essential for us that our users be able to find our documentation -- the wiki is the canonical location we use for that. If it's not googleable, we'll have to move it. Can we get a guarantee that it'll be accessible to google in future?
        Hide
        Paul Querna added a comment -
        It was done in the past due to server load caused by the wiki. Now that the wiki is on the t2000, we can likely take it off.
        Show
        Paul Querna added a comment - It was done in the past due to server load caused by the wiki. Now that the wiki is on the t2000, we can likely take it off.

          People

          • Assignee:
            Unassigned
            Reporter:
            Justin Mason
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development