Velocity
  1. Velocity
  2. VELOCITY-191

UnicodeFileResourceLoader for Win2k Notepad UTF-8 files

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 1.3.1
    • Fix Version/s: 1.5
    • Component/s: Engine
    • Labels:
      None
    • Environment:
      Operating System: All
      Platform: All

      Description

      [copypaste from velocity-user mailing list]
      Date: Mon, 14 Jul 2003 01:52:11 -0700 (PDT)
      From: mailmur <mailmur@yahoo.com>
      Subject: UnicodeFileResourceLoader to support Win2k Notepad UTF-8 files
      Content-Type: text/plain; charset=us-ascii

      I discovered, that files saved with Win2k Notepad UTF-8 format always generated
      an extra ? character at the start of ISO-8859-1 output text.

      This was due to lack of UTF8 BOM mark support in
      InputStreamReader/OutputStreamWriter classes.

      I then created a inputstream implementation to skip BOM mark to overcome this
      bug.

      Please, find here a source + testprogram to see it for yourself. I created
      UnicodeFileResourceLoader to make all this transparent.
      http://koti.mbnet.fi/akini/java/unicodereader/

      I dont know what is the proper procedure to add this to Velocity core (if you
      find this useful), but here it is. Feel free to change class package. Or is
      this even the right list to announce such addition....

      Here is a link to Sun bugparade about the UTF-8 BOM problem:
      http://developer.java.sun.com/developer/bugParade/bugs/4508058.html

        Issue Links

          Activity

          Hide
          Will Glass-Husain added a comment -

          Hi Mailmur. I'm going through some old issues. Thanks for contributing this. Sorry it's been such a long time.

          Would you be willing to place this code (or a link to your web page) on the Velocity wiki? That's a good place for user contributed code. I'm thinking that this might be too specialized for incorporation into the Velocity engine itself. But it'd be nice to make this more prominent for users who might need such a solution.

          http://wiki.apache.org/jakarta-velocity/ContributedCode

          Let us know. If that's ok with you I'll resolve this issue.

          Show
          Will Glass-Husain added a comment - Hi Mailmur. I'm going through some old issues. Thanks for contributing this. Sorry it's been such a long time. Would you be willing to place this code (or a link to your web page) on the Velocity wiki? That's a good place for user contributed code. I'm thinking that this might be too specialized for incorporation into the Velocity engine itself. But it'd be nice to make this more prominent for users who might need such a solution. http://wiki.apache.org/jakarta-velocity/ContributedCode Let us know. If that's ok with you I'll resolve this issue.
          Hide
          Henning Schmiedehausen added a comment -

          Not sure if that will make the 1.5 boat. Please make sure that your example code is either clearly labeled as "might be added to the ASF source code base" or (better) attach it to the issue with the "contribute to the ASF" button set. Thanks.

          Show
          Henning Schmiedehausen added a comment - Not sure if that will make the 1.5 boat. Please make sure that your example code is either clearly labeled as "might be added to the ASF source code base" or (better) attach it to the issue with the "contribute to the ASF" button set. Thanks.
          Hide
          Whome added a comment -

          I have completely forgotten this issue, thx for reminding me. I really hope we can get this to the 1.5 version release.

          I created a zipfile where all necessary changes can be found.
          http://koti.mbnet.fi/akini/java/unicodereader/FileResourceLoader-UnicodeStream.zip

          See "readme.txt" in a zip root.

          I came up the the clever solution to avoid 99% copypaste from the FileResourceLoader file. I have introduced a "skipBOM" resourceloader property. See modified FileResourceLoader, it has only two minor changes.

          • init() method reads skipBOM attribute from configuration
          • findTemplate() has "if-then-else" according to skipBOM value

          changes:
          new file: src\java\org\apache\velocity\io\UnicodeInputStream.java
          modified file: src\java\org\apache\velocity\runtime\resource\loader\FileResourceLoader.java

          velocity.properties example:
          -------------
          runtime.log = ./velocity.log
          runtime.log.invalid.references = false

          1. Change loader.class and see what happens if you run
          2. templates saved with Win2k UTF-8 format.

          resource.loader = file
          file.resource.loader.class = org.apache.velocity.runtime.resource.loader.FileResourceLoader
          file.resource.loader.path = ./templates
          file.resource.loader.cache = true
          file.resource.loader.skipBOM = true
          file.resource.loader.modificationCheckInterval = 5

          input.encoding = UTF-8
          output.encoding = UTF-8
          -------------

          If we get this to Velocity base I can dump my proprietary class and use Velocity library out-of-the-box.

          Show
          Whome added a comment - I have completely forgotten this issue, thx for reminding me. I really hope we can get this to the 1.5 version release. I created a zipfile where all necessary changes can be found. http://koti.mbnet.fi/akini/java/unicodereader/FileResourceLoader-UnicodeStream.zip See "readme.txt" in a zip root. I came up the the clever solution to avoid 99% copypaste from the FileResourceLoader file. I have introduced a "skipBOM" resourceloader property. See modified FileResourceLoader, it has only two minor changes. init() method reads skipBOM attribute from configuration findTemplate() has "if-then-else" according to skipBOM value changes: new file: src\java\org\apache\velocity\io\UnicodeInputStream.java modified file: src\java\org\apache\velocity\runtime\resource\loader\FileResourceLoader.java velocity.properties example: ------------- runtime.log = ./velocity.log runtime.log.invalid.references = false Change loader.class and see what happens if you run templates saved with Win2k UTF-8 format. resource.loader = file file.resource.loader.class = org.apache.velocity.runtime.resource.loader.FileResourceLoader file.resource.loader.path = ./templates file.resource.loader.cache = true file.resource.loader.skipBOM = true file.resource.loader.modificationCheckInterval = 5 input.encoding = UTF-8 output.encoding = UTF-8 ------------- If we get this to Velocity base I can dump my proprietary class and use Velocity library out-of-the-box.
          Hide
          Will Glass-Husain added a comment -

          Mailmur, for legal reasons we need a clear statement from you that "I grant license to ASF for inclusion in ASF works (as per the Apache Software License §5)".

          When you attach the code to an issue, you can mark a checkbox. But if we download it from your site you need to say this explicitly somewhere.

          Thanks!

          Show
          Will Glass-Husain added a comment - Mailmur, for legal reasons we need a clear statement from you that "I grant license to ASF for inclusion in ASF works (as per the Apache Software License §5)". When you attach the code to an issue, you can mark a checkbox. But if we download it from your site you need to say this explicitly somewhere. Thanks!
          Hide
          Whome added a comment -

          new FileResourceLoader to support transparent unicode BOM marker recognisition.

          Show
          Whome added a comment - new FileResourceLoader to support transparent unicode BOM marker recognisition.
          Hide
          Will Glass-Husain added a comment -

          excellent. looks like legal requirements now satisfied. will investigate and apply.

          Show
          Will Glass-Husain added a comment - excellent. looks like legal requirements now satisfied. will investigate and apply.
          Hide
          Henri Yandell added a comment -

          Is this something that would be useful in Commons IO for others to use? [not suggesting Velocity add a dependency...I imagine we would c+p]

          Show
          Henri Yandell added a comment - Is this something that would be useful in Commons IO for others to use? [not suggesting Velocity add a dependency...I imagine we would c+p]
          Hide
          Whome added a comment -

          Henri,
          Absolute yes if you see benefits, you can look at my original web page where I have UnicodeReader.java and UnicodeInputStream.java (=copy pasted to velocity package) sources. I have used them since july 2003 and no problems. I have not made any serious performance bencmarks, but thats something what we can always update later.

          http://koti.mbnet.fi/akini/java/unicodereader/

          It was sad Sun first fixed this bug (utf8 bom mark not recognized), but then they found one of their app server had problems with auto-recognized utf8 handling, so sad implementation .....so Sun unmodified the fix and now made it will not be fixed. So after all, we must use own code now and ever to autorecognize all unicode bom markers.
          http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4508058

          Show
          Whome added a comment - Henri, Absolute yes if you see benefits, you can look at my original web page where I have UnicodeReader.java and UnicodeInputStream.java (=copy pasted to velocity package) sources. I have used them since july 2003 and no problems. I have not made any serious performance bencmarks, but thats something what we can always update later. http://koti.mbnet.fi/akini/java/unicodereader/ It was sad Sun first fixed this bug (utf8 bom mark not recognized), but then they found one of their app server had problems with auto-recognized utf8 handling, so sad implementation .....so Sun unmodified the fix and now made it will not be fixed . So after all, we must use own code now and ever to autorecognize all unicode bom markers. http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4508058
          Hide
          Henning Schmiedehausen added a comment -

          Due to current weather conditions over here I am in the home office and have time to get it in as a really, really latecomer. Thank Will and Nathan for lobbying this on the mailing list.

          • You actually have a long standing bug in there. Your test for UTF-16LE should be after UTF-32LE, because if the first two bytes are a match,
            then the latter is never used.
          • This is a good case for unit tests. Patches without Unit tests are not good (TM).
          • I've rewritten that a bit to make it more readable and also the function of skipBOM better understandable.
          Show
          Henning Schmiedehausen added a comment - Due to current weather conditions over here I am in the home office and have time to get it in as a really, really latecomer. Thank Will and Nathan for lobbying this on the mailing list. You actually have a long standing bug in there. Your test for UTF-16LE should be after UTF-32LE, because if the first two bytes are a match, then the latter is never used. This is a good case for unit tests. Patches without Unit tests are not good (TM). I've rewritten that a bit to make it more readable and also the function of skipBOM better understandable.
          Hide
          Henning Schmiedehausen added a comment -

          This is the last and final patch that goes into 1.5

          However, by studying this further, that is a kludge at best. I mainly put it in to help users that run into that problem.

          The BOM encoding is not actually used anywhere, the stream is mainly used to skip over the BOM so that it does not show up in the templates. However, if we had a way to pass the encoding "up" into the engine (which would mainly mean that the resource loaders don't pass an InputStream but an InputReader in), we could "autodetect" the file encodings.

          Velocity 2.0 stuff, I'm afraid...

          Show
          Henning Schmiedehausen added a comment - This is the last and final patch that goes into 1.5 However, by studying this further, that is a kludge at best. I mainly put it in to help users that run into that problem. The BOM encoding is not actually used anywhere, the stream is mainly used to skip over the BOM so that it does not show up in the templates. However, if we had a way to pass the encoding "up" into the engine (which would mainly mean that the resource loaders don't pass an InputStream but an InputReader in), we could "autodetect" the file encodings. Velocity 2.0 stuff, I'm afraid...
          Hide
          Whome added a comment -

          Good catch this UTF16-LE / UTF-32LE issue, thx.

          Yes, I studied this transparent encoding a bit when doing UnicodeFileResourceLoader. My webpage has UnicodeReader.java IO class using Reader interface. But as you describe its no-can-do in a current state due to InputStream interface.
          http://koti.mbnet.fi/akini/java/unicodereader/

          If Velocity2 is to use Reader interface then we can throw away "input.encoding" parameter, or leave it a legacy default encoding.

          Show
          Whome added a comment - Good catch this UTF16-LE / UTF-32LE issue, thx. Yes, I studied this transparent encoding a bit when doing UnicodeFileResourceLoader. My webpage has UnicodeReader.java IO class using Reader interface. But as you describe its no-can-do in a current state due to InputStream interface. http://koti.mbnet.fi/akini/java/unicodereader/ If Velocity2 is to use Reader interface then we can throw away "input.encoding" parameter, or leave it a legacy default encoding.
          Hide
          Henning Schmiedehausen added a comment -

          Close all resolved issues for Engine 1.5 release.

          Show
          Henning Schmiedehausen added a comment - Close all resolved issues for Engine 1.5 release.
          Hide
          Anders Båtstrand added a comment -

          I can not find the property "<name>.resource.loader.skipBOM" in http://velocity.apache.org/engine/releases/velocity-1.7/developer-guide.html#Configuring_Resource_Loaders, nor does it work for me. Is this property still relevant? Or did it get removed?

          Show
          Anders Båtstrand added a comment - I can not find the property "<name>.resource.loader.skipBOM" in http://velocity.apache.org/engine/releases/velocity-1.7/developer-guide.html#Configuring_Resource_Loaders , nor does it work for me. Is this property still relevant? Or did it get removed?

            People

            • Assignee:
              Henning Schmiedehausen
              Reporter:
              Whome
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development