Droids
  1. Droids
  2. DROIDS-4

NoRobotsClient does not follow the standard for robot.txt location

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      I see that the url for the robots was relative to the path and this not follow the robots standard.

      ...
      public static URL findRobotsUrl(URL base, String prefix) throws MalformedURLException {
      URL url = new URL(base, "robots.txt");
      boolean exist = existUrl(url);
      ...

      It should be "new URL(base, "/robots.txt");"

      I found this on the web:

      Attach a patch to solve this behavior.

      Salu10.

      1. norobot-rfc.diff
        3 kB
        Javier Puerto

        Issue Links

          Activity

          Hide
          Javier Puerto added a comment -

          >> It should be "new URL(base, "/robots.txt");"

          >In your patch however you do "new URL(getUrlPrefix(base) + "/robots.txt");".

          This is the first version.

          >The first suggestion is correct the patch is not.

          I change the code while wrote the comment and forget to change the attachment.

          Salu2.

          Show
          Javier Puerto added a comment - >> It should be "new URL(base, "/robots.txt");" >In your patch however you do "new URL(getUrlPrefix(base) + "/robots.txt");". This is the first version. >The first suggestion is correct the patch is not. I change the code while wrote the comment and forget to change the attachment. Salu2.
          Hide
          Thorsten Scherler added a comment -

          Committed revision 703865.

          Show
          Thorsten Scherler added a comment - Committed revision 703865.
          Hide
          Thorsten Scherler added a comment -

          http://www.w3.org/TR/html4/appendix/notes.html#h-B.4.1.1
          "...There can only be a single "/robots.txt" on a site..."

          http://www.robotstxt.org/norobots-rfc.txt (sec 3.1)
          "...under a standard relative path on the server: "/robots.txt"."

          > It should be "new URL(base, "/robots.txt");"

          In your patch however you do "new URL(getUrlPrefix(base) + "/robots.txt");".

          The first suggestion is correct the patch is not.

          In URL java doc for public "URL(URL context, String spec)" you find:

          • If the spec's path component begins with a slash character
          • "/" then the
          • path is treated as absolute and the spec path replaces the context path.

          Meaning there is no need to use getUrlPrefix(base) , further this method returns a String. There is however no URL(String,String).

          I will apply the correct version now.

          Thanks Javier for spotting this and providing a patch.

          Show
          Thorsten Scherler added a comment - http://www.w3.org/TR/html4/appendix/notes.html#h-B.4.1.1 "...There can only be a single "/robots.txt" on a site..." http://www.robotstxt.org/norobots-rfc.txt (sec 3.1) "...under a standard relative path on the server: "/robots.txt"." > It should be "new URL(base, "/robots.txt");" In your patch however you do "new URL(getUrlPrefix(base) + "/robots.txt");". The first suggestion is correct the patch is not. In URL java doc for public "URL(URL context, String spec)" you find: If the spec's path component begins with a slash character "/" then the path is treated as absolute and the spec path replaces the context path. Meaning there is no need to use getUrlPrefix(base) , further this method returns a String. There is however no URL(String,String). I will apply the correct version now. Thanks Javier for spotting this and providing a patch.

            People

            • Assignee:
              Unassigned
              Reporter:
              Javier Puerto
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development