Infrastructure
  1. Infrastructure
  2. INFRA-5367

Apache mirror.cgi - does not recognize Preferred mirror URL ending with slash

    Details

      Description

      There is a BUG in the python script for:

         http://www.apache.org/dyn/mirrors/mirrors.cgi

      that prevents web pages (i.e. XERCES mirror downloads) from specifying a specific mirror from a selection of mirrors. This issue is also common to other Apache projects.

      The problem is that the list of mirrors is populated from the database file "mirrors.list" that have the URI's
      appended with a slash. The python script that looks for a Preferred mirror expects the parameter to NOT HAVE an appended slash.

      It looks like the Apache mirror.cgi python script needs to be fixed.
      This code snip shows where the '/' is being mistreated.

      Here is the snip of code from Apache dyn/mirrors/mirrors.cgi that has the issue:

        # Check if the requested Preferred mirror is in the list
        # Note the user-requested mirror doesn't have a trailing-slash
        prefmir = None
        if preferred:
          for mir in mirrors:
            if mir[2][:-1] == preferred:
              prefmir = mir
              break
        # Otherwise pick a preferred mirror from our country

      --

      This snip is from the Xerces page that creates a form to select a preferred mirror. This should allow the user to select a specific mirror for downloads. The resulting GET method invokes the mirror.cgi with a URL that looks like:

         "xerces.apache.org/mirrors.cgi&Preferred="the preferred mirror string/"

      Note that "the preferred mirror string/" is uri-encoded and ends with a slash.

      <p>You are currently using the <strong>[preferred]</strong> mirror.
      If you encounter a problem with this mirror, please select another mirror.
      If all mirrors are failing, there are <em>backup</em> mirrors
      (at the end of the mirrors list) that should be available.</p>

      <a name="SelectMirror"></a>
      <form action="[location]" method="get" id="SelectMirror">Other mirrors:
      <select name="Preferred">
      <!--[if-any http] [for http]--><option selected="selected"
      value="[http]">[http]</option>
      <!--[end] [end]-->
      <!--[if-any ftp] [for ftp]--><option value="[ftp]">[ftp]</option>
      <!--[end] [end]-->
      <!--[if-any backup] [for backup]--><option value="[backup]">[backup]
      (backup)</option>
      <!--[end] [end]--></select> <input value="Change" type="submit">
      </form>

      --

      Sincerely,
      Steven J. Hathaway

        Activity

        Hide
        #asfinfra IRC Bot added a comment -
        <danielsh> Ideally we'll tolerate mirrors either with or without a trailing slash. Can you perhaps suggest a patch to this effect?
        Show
        #asfinfra IRC Bot added a comment - <danielsh> Ideally we'll tolerate mirrors either with or without a trailing slash. Can you perhaps suggest a patch to this effect?
        Hide
        Tony Stevenson added a comment -
        pending user feedback, will close in 72 hours if no feedback.
        Show
        Tony Stevenson added a comment - pending user feedback, will close in 72 hours if no feedback.
        Hide
        Steven J. Hathaway added a comment -
        Here is the snip of code from Apache dyn/mirrors/mirrors.cgi that has the issue:

          # Check if the requested Preferred mirror is in the list
          # Note the user-requested mirror doesn't have a trailing-slash
          prefmir = None
          if preferred:
            for mir in mirrors:
              if mir[2][:-1] == preferred:
                prefmir = mir
                break
          # Otherwise pick a preferred mirror from our country

        ---
        Here is my Python replacement that should make the trailing slash optional
        for the requested Preferred mirror.
          
          # Check if the requested Preferred mirror is in the list.
          # The user requested mirror may have an optional trailing slash.
          prefmir = None
          if preferred:
            for mir in mirrors:
              if preferred[-1] == '/':
                # The user requested mirror has a trailing slash
                if mir[2] == preferred:
                  prefmir = mir
                  break
              elif mir[2][:-1] == preferred:
                # The user requested mirror has no trailing slash and match found
                prefmir = mir
                break
          # Otherwise pick a preferred mirror from our country

        -
        I HAVE NOT TESTED THIS CODE
        Show
        Steven J. Hathaway added a comment - Here is the snip of code from Apache dyn/mirrors/mirrors.cgi that has the issue:   # Check if the requested Preferred mirror is in the list   # Note the user-requested mirror doesn't have a trailing-slash   prefmir = None   if preferred:     for mir in mirrors:       if mir[2][:-1] == preferred:         prefmir = mir         break   # Otherwise pick a preferred mirror from our country --- Here is my Python replacement that should make the trailing slash optional for the requested Preferred mirror.      # Check if the requested Preferred mirror is in the list.   # The user requested mirror may have an optional trailing slash.   prefmir = None   if preferred:     for mir in mirrors:       if preferred[-1] == '/':         # The user requested mirror has a trailing slash         if mir[2] == preferred:           prefmir = mir           break       elif mir[2][:-1] == preferred:         # The user requested mirror has no trailing slash and match found         prefmir = mir         break   # Otherwise pick a preferred mirror from our country - I HAVE NOT TESTED THIS CODE
        Hide
        Henk Penning added a comment -
        Urls from 'mirrors.list' always contain a trailing '/' ; other software complains hourly if the don't.

        In http://xerces.apache.org/mirrors.cgi I see lines like :
        <option value="http://mirrors.supportex.net/apache/">http://mirrors.supportex.net/apache/&lt;/option>
        So those are ok.

        I think that :
          if mir[2][:-1] == preferred:
        should just be :
          if mir[2] == preferred:

        Some form of 'trailing slash' problem has been with us repeatedly ; I remember a change was made
        after a comlaint about 'cross site scripting', but I don't remember the details.

        In the xerces download page I see '//'-en, because it contains lines like
        href="[preferred]/xerces/c/3/sources/xerces-c-3.1.1.tar.gz"
        The script could substitute "preferred" without the trailing slash, but that could possibly break stuff.

        It's a pity (for other reasons too) that mirrors.cgi only recognizes the "[preferred]" and has no notion
        of the 'full" url, or "file-part" of the reference ; if it had, and the "file-part" doesn't exists in /dist/ (and
        hence doesn't exist on the mirrors, it could refer to archive.a.o (if it exists there).

        HPP
        Show
        Henk Penning added a comment - Urls from 'mirrors.list' always contain a trailing '/' ; other software complains hourly if the don't. In http://xerces.apache.org/mirrors.cgi I see lines like : <option value=" http://mirrors.supportex.net/apache/ "> http://mirrors.supportex.net/apache/&lt;/option > So those are ok. I think that :   if mir[2][:-1] == preferred: should just be :   if mir[2] == preferred: Some form of 'trailing slash' problem has been with us repeatedly ; I remember a change was made after a comlaint about 'cross site scripting', but I don't remember the details. In the xerces download page I see '//'-en, because it contains lines like href="[preferred]/xerces/c/3/sources/xerces-c-3.1.1.tar.gz" The script could substitute "preferred" without the trailing slash, but that could possibly break stuff. It's a pity (for other reasons too) that mirrors.cgi only recognizes the "[preferred]" and has no notion of the 'full" url, or "file-part" of the reference ; if it had, and the "file-part" doesn't exists in /dist/ (and hence doesn't exist on the mirrors, it could refer to archive.a.o (if it exists there). HPP
        Hide
        Sebb added a comment -
        I think the code proposed by Steven is to allow for the user-supplied perferred URL having an *optional* trailing slash.

        The problem at the moment is that the code assumes the following:

          # Note the user-requested mirror doesn't have a trailing-slash

        whereas in fact it almost certainly *will have* a trailing slash.

        This is likely why the Change button does not work; the preferred URL presented when clicking Change includes a trailing slash, which will never match, so the resulting URL is picked at random.

        Seems to me a better solution would be to either assume that the user provided URL always has a slash (which I expect is the case), or force it to have a slash.

        My suggestion is:

          # Check if the requested Preferred mirror is in the list
          # Note the user-requested mirror may not have a trailing-slash
          prefmir = None
          if preferred:
            # Fix up missing trailing slash
            if not preferred.endswith('/'):
              preferred += '/'
            for mir in mirrors:
              if mir[2] == preferred:
                prefmir = mir
                break
        Show
        Sebb added a comment - I think the code proposed by Steven is to allow for the user-supplied perferred URL having an *optional* trailing slash. The problem at the moment is that the code assumes the following:   # Note the user-requested mirror doesn't have a trailing-slash whereas in fact it almost certainly *will have* a trailing slash. This is likely why the Change button does not work; the preferred URL presented when clicking Change includes a trailing slash, which will never match, so the resulting URL is picked at random. Seems to me a better solution would be to either assume that the user provided URL always has a slash (which I expect is the case), or force it to have a slash. My suggestion is:   # Check if the requested Preferred mirror is in the list   # Note the user-requested mirror may not have a trailing-slash   prefmir = None   if preferred:     # Fix up missing trailing slash     if not preferred.endswith('/'):       preferred += '/'     for mir in mirrors:       if mir[2] == preferred:         prefmir = mir         break
        Hide
        Sebb added a comment -
        Here is a test showing how the "preferred" option only works if the trailing slash is absent; however the entries are generated with slashes.

        # no slash; sets preferred mirror properly:
        pwd ~ : curl --data-urlencode Preferred=http://mirror.catn.com/pub/apache -G -s http://commons.apache.org/imaging/download_sanselan.cgi | grep "currently"
         You are currently using <b>http://mirror.catn.com/pub/apache/&lt;/b>. If you

        # with slash, preferred mirror is randomly chosen:
        pwd ~ : curl --data-urlencode Preferred=http://mirror.catn.com/pub/apache/ -G -s http://commons.apache.org/imaging/download_sanselan.cgi | grep "currently"
         You are currently using <b>http://apache.mirrors.hoobly.com/&lt;/b>. If you
        pwd ~ : curl --data-urlencode Preferred=http://mirror.catn.com/pub/apache/ -G -s http://commons.apache.org/imaging/download_sanselan.cgi | grep "currently"
         You are currently using <b>http://mirrors.ibiblio.org/apache/&lt;/b>. If you
        pwd ~ :
        Show
        Sebb added a comment - Here is a test showing how the "preferred" option only works if the trailing slash is absent; however the entries are generated with slashes. # no slash; sets preferred mirror properly: pwd ~ : curl --data-urlencode Preferred= http://mirror.catn.com/pub/apache -G -s http://commons.apache.org/imaging/download_sanselan.cgi | grep "currently"  You are currently using <b> http://mirror.catn.com/pub/apache/&lt;/b >. If you # with slash, preferred mirror is randomly chosen: pwd ~ : curl --data-urlencode Preferred= http://mirror.catn.com/pub/apache/ -G -s http://commons.apache.org/imaging/download_sanselan.cgi | grep "currently"  You are currently using <b> http://apache.mirrors.hoobly.com/&lt;/b >. If you pwd ~ : curl --data-urlencode Preferred= http://mirror.catn.com/pub/apache/ -G -s http://commons.apache.org/imaging/download_sanselan.cgi | grep "currently"  You are currently using <b> http://mirrors.ibiblio.org/apache/&lt;/b >. If you pwd ~ :
        Hide
        Steven J. Hathaway added a comment -
        Sebb,

        Your test verifies how the current /cgi/mirrors/mirrors.cgi python script behaves.

        # no slash; sets preferred mirror properly:
        # with slash, preferred mirror is randomly chosen:

        At XERCES and other Apache projects, we populate a download template.html page by reference using
        the /cgi/mirrors/mirrors.cgi script. This replaces tokens in the template.html page from which the users
        then make their selection to download specific distribution artifacts.

        Here is some mirror access testing.

        In situations where the template contains:
           [preferred]/xerces/c/3/sources/filename.tar.gz
        and the mirror is "http://apache.osuosl.org/" as [preferred] the resulting URL becomes
           "http://apache.osuosl.org//xerces/c/3/sources/filename.tar.gz"
        If the mirror is "http://www.motorlogy.com/apache/" the resulting URL becomes
           "htp://motorlogy.com/apache//xerces/c/3/sources/filename.tar.gz"
        The presence of "//" has NO operational problems with most browsers.
        Empty resource segments in the URL supplementary resource pathnames
        do not affect the usability of the mirror distribution repositories.

        Note the mirrors.cgi script properly returns [preferred] with a trailing '/' slash.
        This is no problem.

        If we at Apache XERCES were to use:
          [preferred]xerces/c/3/filename.tar.gz
        there would be a problem if mirror.cgi replaced [preferred] without a trailing slash.

        When we initially submit our template for edit by mirrors.cgi, there is no &Preferred=<path>
        request. We then get our page populated with a random mirror and a selection form
        filled with alternate mirror prefixes. The selection form contains replacements for [http]
        [ftp] and [backup] mirrors. These items are returned by mirrors.cgi with the slash '/'
        as the last character in the mirror URL. This is good.

        What is bad: We then issue a request to change mirror using the &Preferred=<path>
        request. The &Preferred <path> is retrieved by the form selection. The form
        selection content has the trailing '/' for each mirror URL -- then the secondary check
        in mirror.cgi expects to see the <path> without a '/' but that character is already in
        our selection table. The mirror.cgi then selects another random mirror instead of
        the requested mirror because the trailing '/' slash is not properly interpreted.

        The recommended edit proposed by Henk Penning would solve the problem for
        requests with &Preferred=<path>/ having the '/' as the last character.

        I think that :
          if mir[2][:-1] == preferred:
        should just be :
          if mir[2] == preferred:

        I have no visibility into whether other projects call /dyn/mirrors/mirrors.cgi with
        &Preferred=<path> without a trailing slash.

        My proposed patch, as the result of a comment by <danielsh> would make the
        trailing '/' an optional character being passed as the &Preferred= cgi parameter
        using the HTTP GET method. Normally all &Preferred=<path>/ parameters should
        have the trailing slash. The patch I propose would make the trailing '/' optional.
        Currently, if the trailing slash exists in the &Preferred= parameter, the mirror.cgi
        test for an available mirror fails and a random mirror is again selected.

        Sincerely,
        Steven J. Hathaway
        Xalan Documentation Project



        Show
        Steven J. Hathaway added a comment - Sebb, Your test verifies how the current /cgi/mirrors/mirrors.cgi python script behaves. # no slash; sets preferred mirror properly: # with slash, preferred mirror is randomly chosen: At XERCES and other Apache projects, we populate a download template.html page by reference using the /cgi/mirrors/mirrors.cgi script. This replaces tokens in the template.html page from which the users then make their selection to download specific distribution artifacts. Here is some mirror access testing. In situations where the template contains:    [preferred]/xerces/c/3/sources/filename.tar.gz and the mirror is " http://apache.osuosl.org/ " as [preferred] the resulting URL becomes    " http://apache.osuosl.org//xerces/c/3/sources/filename.tar.gz " If the mirror is " http://www.motorlogy.com/apache/ " the resulting URL becomes    " htp://motorlogy.com/apache//xerces/c/3/sources/filename.tar.gz " The presence of "//" has NO operational problems with most browsers. Empty resource segments in the URL supplementary resource pathnames do not affect the usability of the mirror distribution repositories. Note the mirrors.cgi script properly returns [preferred] with a trailing '/' slash. This is no problem. If we at Apache XERCES were to use:   [preferred]xerces/c/3/filename.tar.gz there would be a problem if mirror.cgi replaced [preferred] without a trailing slash. When we initially submit our template for edit by mirrors.cgi, there is no &Preferred=<path> request. We then get our page populated with a random mirror and a selection form filled with alternate mirror prefixes. The selection form contains replacements for [http] [ftp] and [backup] mirrors. These items are returned by mirrors.cgi with the slash '/' as the last character in the mirror URL. This is good. What is bad: We then issue a request to change mirror using the &Preferred=<path> request. The &Preferred <path> is retrieved by the form selection. The form selection content has the trailing '/' for each mirror URL -- then the secondary check in mirror.cgi expects to see the <path> without a '/' but that character is already in our selection table. The mirror.cgi then selects another random mirror instead of the requested mirror because the trailing '/' slash is not properly interpreted. The recommended edit proposed by Henk Penning would solve the problem for requests with &Preferred=<path>/ having the '/' as the last character. I think that :   if mir[2][:-1] == preferred: should just be :   if mir[2] == preferred: I have no visibility into whether other projects call /dyn/mirrors/mirrors.cgi with &Preferred=<path> without a trailing slash. My proposed patch, as the result of a comment by <danielsh> would make the trailing '/' an optional character being passed as the &Preferred= cgi parameter using the HTTP GET method. Normally all &Preferred=<path>/ parameters should have the trailing slash. The patch I propose would make the trailing '/' optional. Currently, if the trailing slash exists in the &Preferred= parameter, the mirror.cgi test for an available mirror fails and a random mirror is again selected. Sincerely, Steven J. Hathaway Xalan Documentation Project
        Hide
        Steven J. Hathaway added a comment -
        The XERCES project download.html template does not use /dyn/mirrors/mirrors.cgi to
        identify any distribution artifact. We use /dyn/mirrors/mirrors.cgi only to identify a
        usable or preferred mirror from which artifacts can be retrieved.

        Steven J. Hathaway
        Show
        Steven J. Hathaway added a comment - The XERCES project download.html template does not use /dyn/mirrors/mirrors.cgi to identify any distribution artifact. We use /dyn/mirrors/mirrors.cgi only to identify a usable or preferred mirror from which artifacts can be retrieved. Steven J. Hathaway
        Hide
        Sebb added a comment -
        @Steven: I think there are two issues being conflated here.

        1) the behaviour of the script when selecting a mirror using the Preferred CGI parameter (pressing the Change button)
        AFAICT, this is the issue originally raised by the JIRA.

        2) the behaviour of the script when creating the download URL from the selected [preferred] mirror.
        This is a secondary (but important) issue to make sure that the created URL cannot be accidentally compromised.

        As far as I can tell, both your original patch and mine would fix issue 1.

        Given that the script already ensures that mirror paths end with /, entries in the drop-down list will always have a trailing /
        So pressing the Change button will pass a /-terminated URL as the Preferred parameter, which can be compared directly with the entries in the mirror list.
        To allow for using a Preferred URL without trailing slash, it seems to me the simplest is to add a / if required, and then compare with the existing entries in the mirror list.

        Issue 2) is already solved, because the mirror list always contains URLs with trailing /

        This may results in creating URLs with // rather than /, but that does not seem to cause problems with browsers; it just looks wrong.
        Though I don't understand why this is not already happening, as the script I'm looking at (on minotaur) seems to already do this:

          # The mirror URL already has a trailing slash. Avoid doubling it up.
          if path_info.startswith('/'):
            path_info = path_info[1:]
        Show
        Sebb added a comment - @Steven: I think there are two issues being conflated here. 1) the behaviour of the script when selecting a mirror using the Preferred CGI parameter (pressing the Change button) AFAICT, this is the issue originally raised by the JIRA. 2) the behaviour of the script when creating the download URL from the selected [preferred] mirror. This is a secondary (but important) issue to make sure that the created URL cannot be accidentally compromised. As far as I can tell, both your original patch and mine would fix issue 1. Given that the script already ensures that mirror paths end with /, entries in the drop-down list will always have a trailing / So pressing the Change button will pass a /-terminated URL as the Preferred parameter, which can be compared directly with the entries in the mirror list. To allow for using a Preferred URL without trailing slash, it seems to me the simplest is to add a / if required, and then compare with the existing entries in the mirror list. Issue 2) is already solved, because the mirror list always contains URLs with trailing / This may results in creating URLs with // rather than /, but that does not seem to cause problems with browsers; it just looks wrong. Though I don't understand why this is not already happening, as the script I'm looking at (on minotaur) seems to already do this:   # The mirror URL already has a trailing slash. Avoid doubling it up.   if path_info.startswith('/'):     path_info = path_info[1:]
        Hide
        Sebb added a comment -
        Patch to append / if necessary to Preferred URL
        Then compare against mirror list entries directly

        This is slightly different from the code in my comment.
        The Preferred parameter is fixed where it is created, rather than later on where it is used.
        Show
        Sebb added a comment - Patch to append / if necessary to Preferred URL Then compare against mirror list entries directly This is slightly different from the code in my comment. The Preferred parameter is fixed where it is created, rather than later on where it is used.
        Hide
        Sebb added a comment -
        Just realised that the removal of leading / on path_info is irrelevant to the template processing - plain text in the template is passed through without modification.
        So if the template contains a leading / after the [preferred] directive it will be retained.
        Show
        Sebb added a comment - Just realised that the removal of leading / on path_info is irrelevant to the template processing - plain text in the template is passed through without modification. So if the template contains a leading / after the [preferred] directive it will be retained.
        Hide
        Steven J. Hathaway added a comment -
        Sebb,

        Your patch will work if the Python CGI 'preferred' object is mutable, meaning it can be modified
        by appending characters. If the 'preferred' object is immutable, static, cannot be altered, then
        my code should work OK by eliminating the matching of the '/' on the mirror URI string found
        in the list of mirrors.

        It looks like your patch should work.

        Steven J. Hathaway
        Show
        Steven J. Hathaway added a comment - Sebb, Your patch will work if the Python CGI 'preferred' object is mutable, meaning it can be modified by appending characters. If the 'preferred' object is immutable, static, cannot be altered, then my code should work OK by eliminating the matching of the '/' on the mirror URI string found in the list of mirrors. It looks like your patch should work. Steven J. Hathaway
        Hide
        Steven J. Hathaway added a comment -
        As for "//" empty resource strings appearing in URLs ...

        This is related to the SERVER-SIDE directory structures in place
        on our mirror sites. The multiple adjacent "/////" strings that
        delimit unnamed directory structures will collapse to the single
        parent directory. This is an artifact of operating system directory
        structures. Sure it looks awkward .. but "a/////////b/c////d"
        becomes merely "a/b/c/d" when the effective URL is evaluated.
        If we were using URI instead of URL processing, then the "////"
        could be significant. (See the difference between Uniform
        Resource Identifier vs. Uniform Resource Locator).

        Steven J. Hathaway
        Show
        Steven J. Hathaway added a comment - As for "//" empty resource strings appearing in URLs ... This is related to the SERVER-SIDE directory structures in place on our mirror sites. The multiple adjacent "/////" strings that delimit unnamed directory structures will collapse to the single parent directory. This is an artifact of operating system directory structures. Sure it looks awkward .. but "a/////////b/c////d" becomes merely "a/b/c/d" when the effective URL is evaluated. If we were using URI instead of URL processing, then the "////" could be significant. (See the difference between Uniform Resource Identifier vs. Uniform Resource Locator). Steven J. Hathaway
        Hide
        Henk Penning added a comment -
        fyi ; changed mirrors.cgi :

        - # Check if the requested Preferred mirror is in the list
        - # Note the user-requested mirror doesn't have a trailing-slash
        + # Check if the requested Preferred mirror is in the list.
        + # If the user-requested mirror doesn't have a trailing-slash, add '/'.
           prefmir = None
           if preferred:
        + if not preferred.endswith('/'):
        + preferred += '/'
             for mir in mirrors:
        - if mir[2][:-1] == preferred:
        + if mir[2] == preferred:
                 prefmir = mir
        Show
        Henk Penning added a comment - fyi ; changed mirrors.cgi : - # Check if the requested Preferred mirror is in the list - # Note the user-requested mirror doesn't have a trailing-slash + # Check if the requested Preferred mirror is in the list. + # If the user-requested mirror doesn't have a trailing-slash, add '/'.    prefmir = None    if preferred: + if not preferred.endswith('/'): + preferred += '/'      for mir in mirrors: - if mir[2][:-1] == preferred: + if mir[2] == preferred:          prefmir = mir
        Hide
        Steven J. Hathaway added a comment -
        Henk,

        Thanks! The issue IMHO is resolved as fixed.

        Steven J. Hathaway


        Show
        Steven J. Hathaway added a comment - Henk, Thanks! The issue IMHO is resolved as fixed. Steven J. Hathaway
        Hide
        Steven J. Hathaway added a comment -
        Tested a variety of ways - FIXED
        Steven J. Hathaway
        Show
        Steven J. Hathaway added a comment - Tested a variety of ways - FIXED Steven J. Hathaway

          People

          • Assignee:
            Unassigned
            Reporter:
            Steven J. Hathaway
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development