Uploaded image for project: 'XalanJ2'
  1. XalanJ2
  2. XALANJ-2613

TransformerIdentityImpl doesn't properly handle file URIs with percent-encoded Unicode characters

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.7.2
    • transformation
    • Security Level: No security risk; visible to anyone (Ordinary problems in Xalan projects. Anybody can view the issue.)
    • None

    Description

      When using Xalan, and javax.xml.transform.Transformer, with a javax.xml.transform.stream.StreamResult constructed from a java.io.File object that contains Unicode characters, the Transformer will create an output file with the wrong file path.

      I have attached a very small repro, which is a very small Java file and a very small bash script used to compile and run the test, and print out a few relevant environmental details.

       

      The cause of the bug is this:

      When constructing a StreamResult object by passing a File object to the constructor, the StreamResult object saves a string representation of the URI object created from the File object. This string representation of the URI is properly formatted, which means that the individual path elements of the path of the URI are properly percent-encoded. The Xalan TransformerImpl class calls getSystemId on StreamResult to get this string representation of the URI, and it simply strips off the leading "file://" prefix, and uses the remainder to create a FileOutputStream object. However, the remainder of the string is the result of URI percent-encoding, and as such, it is not suitable for directly passing to FileOutputStream. Instead, the code here must use a URI utility to properly interpret the URI string, and to undo the percent-encoding, to obtain a string that is suitable for creating a FileOutputStream object.

      When the file path contains only ASCII characters, percent-encoding does nothing, which means that the code works with ASCII. However, as soon as any other Unicode character is part of the file path, then it breaks by writing to the wrong file path.

      Because it writes to the wrong file path which may silently succeed, this may have security concerns.

      Attachments

        1. Repro.java
          2 kB
          Joshua Maurice
        2. runtest.sh
          0.3 kB
          Joshua Maurice
        3. URL-encoding-fix.diff
          2 kB
          Lorenzo Dalla Vecchia

        Activity

          People

            shathaway Steven J. Hathaway
            JoshuaMaurice Joshua Maurice
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: