Uploaded image for project: 'Subversion'
  1. Subversion
  2. SVN-2487

mod_dav_svn and locales fail to play nicely together

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: trunk
    • Fix Version/s: 1.8.0
    • Component/s: mod_dav_svn
    • Labels:

      Description

      This problem can manifest itself in a number of ways, but the underlying root
      issue is that httpd runs with a locale of C, the default POSIX locale, which
      uses a 7 bit ASCII character set.  Internally in various places in Subversion we
      attempt to convert from our internal utf8 strings into native encoded strings,
      which fails spectacularly under DAV when the utf8 string includes multibyte
      characters because those characters cannot be expressed in 7 bit ASCII encoding.
      
      I brought this problem up on the HTTPD dev list, and the consensus is that httpd
      runs in the C locale because using the system locale (i.e. respecting the LANG
      or LC_ALL environment variables via a call to setlocale(LC_ALL, "") like svn
      does for all its command line programs) results in unpredictable behavior for
      various functions that depend on the locale.  Apparently some modules take
      matters into their own hands and use setlocale to set the locale manually, but
      this is very much not a recommended practice because it's a global setting and
      the results are not easily predicted.
      
      What is the end result of this?
      
      Well, you can't use 'svn lock' on a file that has multibyte characters in its
      path, because when mod_dav_svn tries to call the pre-lock hook script it needs
      to pass the filename, which it tries to translate into native encoding first. 
      This case is even weirder, since the hook script actually runs in an empty
      environment, so it has no way to know what its locale actually is because it
      can't access the appropriate environment variables from the parent httpd process.
      
      Perhaps more disturbing is that if a repository has multibyte characters in its
      path (i.e. it's just in a directory that's got multibyte charactes in its name)
      you can't do anything at all with it, even browsing the repository results in
      errors about not being able to open the repository, the underlying errors that
      show up in error_log are predictably about translating from utf8 -> native.
      
      What's the fix?  I have no idea.  We can't just trust that converting to
      "native" encoding is the correct thing to do in all cases, but unfortunately
      we've got an awful lot of code in svn that assumes that's what it should be
      doing when paths are passed from svn's internals into the outside world. 
      Additionally in the case of hook scripts it's not clear that converting to
      native is even desireable, since the script doesn't have any way to tell what
      native encoding actually is.
      

        Attachments

        1. 1_subversion-1.4.3-svn_locale_charset.patch
          3 kB
          Subversion Importer
        2. 2_subversion-1.6.5-svn_locale_charset.patch
          3 kB
          Subversion Importer
        3. 3_assumed-native-charset.diff
          2 kB
          Stefan Sperling
        4. 4_force-utf8.diff
          2 kB
          Stefan Sperling
        5. 5_mod_dav_svn.diff
          1 kB
          Stefan Sperling
        6. 6_no-env.diff
          7 kB
          Stefan Sperling

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              rooneg Garrett Rooney
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: