Uploaded image for project: 'Subversion'
  1. Subversion
  2. SVN-807

gracefully degrade from failed charset conversion

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Minor
    • Resolution: Fixed
    • all
    • 1.0.0
    • None
    • None

    Description

      Right now, if a log message contains characters that cannot be
      represented in the client's locale, that log message will simply show
      up as:
      
         "[unconvertible log msg]"
      
      Graceful degradation would be nice here :-).
      
      See the dev list thread "Re: converting unconvertible UTF-8 data" for
      discussion of possible solutions.
      
      My first idea was to write a fuzzy converter function that replaces
      every unconverted byte with an escape sequence representing its
      numerical code ("?\XXX" or somesuch).
      
      Then Ulrich Drepper pointed out that since this data is mainly for
      human consumption, the "//TRANSLIT" behavior of glibc's iconv and GNU
      libiconv would produce more readable output.  We can at least detect
      when we're using one of those iconv's and append that option to the
      to-charset string where appropriate.  (Marcus Comstedt points out that
      some iconv implementations automatically do transliteration for you,
      and don't even tell you whether or not it's happened, which is sort of
      unnerving.)
      
      However, if you are on a system that doesn't support this, you'll get
      the result above.
      
      So there are various non-mutually-exclusive steps to take here:
      
         - Write the fuzzy function with the escape codes, use where
      translit not available.
      
         - Meanhwile, get Subversion doing transliteration where possible
      (Ulrich may do)
      
         - Possible early fix: make "svn log" accept --force or
      --message-encoding, so one
            can make it output the raw bytes or a specific encoding,
      respectively.
      

      Attachments

        1. 2_ulrich.mbox
          4 kB
          Karl Fogel
        2. 1_brane-utf-8.mbox
          8 kB
          Karl Fogel

        Activity

          People

            Unassigned Unassigned
            kfogel Karl Fogel
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: