Uploaded image for project: 'OFBiz'
  1. OFBiz
  2. OFBIZ-10275

UtilCodec URL decoding breaks values with german umlauts

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • Trunk
    • 16.11.05, 17.12.01, 18.12.01
    • framework
    • None

    Description

      ...and other UTF-8 characters encoded in two hex. values like in this example:

      String example = "/webcontent/example_öl.jpg";
      String encoded = UtilCodec.getEncoder("url").encode(example);
      System.out.println(encoded);
      => "%2Fwebcontent%2Fexample_%C3%B6l.jpg"
      
      String decoded = UtilCodec.getDecoder("url").decode(encoded); System.out.println(decoded);
      => "/webcontent/example_öl.jpg"

       

      The reason for this is the OWASP ESAPI PercentCodec implementation used within the method UtilCodec.canonicalize, called before the proper decoding via java.net.URLDecoder here:

      public String decode(String original) {
          try {
              String canonical = canonicalize(original);
              return URLDecoder.decode(canonical, "UTF-8");
          } catch (UnsupportedEncodingException ee) {
              Debug.logError(ee, module);
              return null;
          }
      }

       

      The fix could be to only use the canonicalize logic to check the original value for double/mixed encoding and to encode the original value afterwards via URLDecoder instead of using the canonicalize output for this.
      This way the UrlCodec decode method matches the encode method by only using URLDecoder / URLEncoder for doing the main job.

      Attachments

        Issue Links

          Activity

            People

              mbrohl Michael Brohl
              mbecker Martin Becker
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: