Issue Details (XML | Word | Printable)

Key: SHALE-292
Type: Bug Bug
Status: Resolved Resolved
Resolution: Fixed
Priority: Major Major
Assignee: Gary VanMatre
Reporter: Tom Pasierb
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Shale

Clay doesn't consider file's encoding when loading/parsing html templates from hdd

Created: 25/Sep/06 05:03 PM   Updated: 23/Jan/07 04:40 PM
Component/s: Clay
Affects Version/s: 1.0.0, 1.0.1, 1.0.2, 1.0.3, 1.0.4-SNAPSHOT
Fix Version/s: 1.0.4

File Attachments:
  Size
HTML File Licensed for inclusion in ASF works some.html 2006-09-25 05:07 PM Tom Pasierb 0.5 kB
File Licensed for inclusion in ASF works whatever.jsp 2006-09-25 05:07 PM Tom Pasierb 1 kB
Environment: windows xp, tomcat 5.5 (started with -Dfile.encoding=UTF-8 option, this way myfaces doesn't convert all non-ascii characters to html entities), myfaces 1.1.3

Flags: Important


 Description  « Hide
Clay reads html files assuming ascii encoding. This way it's impossible to have characters other than ascii in templates. They do not display correctly. As indicated on the user mailing list a Reader object should be used for reading templates instead of InputStream. I wrote more about this on shale user mailing list.

We probably need:
1. app wide config option for setting encoding clay should use for reading templates in. Clay would default to this setting unless maybe
2. some per file encoding config option was set (something similar to @page pageEncoding directive for jsps)

I marked it as major as this should be corrected if one wants to develop localized applications with non-ascii characters in html templates.

As noted by Craig this probably also applies to xml templates, which I haven't tried myself.

 All   Comments   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Tom Pasierb added a comment - 25/Sep/06 05:07 PM
Both files are encoded in utf-8 encoding.

Gary VanMatre added a comment - 26/Sep/06 04:24 PM
I like the idea of being able to specify the encoding per document template. Many templates can be included in a single page. We might be able to build on a similar idea that is built into the clay markup parser.

The clay HTML template parser has a couple special tokens that it used to block out a markup that should be excluded form the document. These tokens are in the form of comments.

<!-- ### clay:remove ### -->
   <html>
       exclude this text
<!-- ### /clay:remove ### -->

These special comments and the markup between is ignored - dropped from the document.

What if we used another special comment token that operates like a page directive?

<!-- ### clay:page charset="UTF-8" / -->

This comment would have to be in the first few bytes of the template document. The top of each template could be sniffed for this token. If it exists, extract the charset and open the target template with the specified encoding.

 
Reading each template file would be broken down into two steps.
1) Look at the top of the template for the token comment containing the charset. If not found, use the vm's default "file.encoding".
2) Read the template in with the determined encoding

Does is this a sound plan? Any thoughts?

Tom Pasierb added a comment - 27/Sep/06 06:39 AM
This sounds like a good idea.

However, It would be nice to have an extra application wide config option for loading html templates. The proccessing would look like this:

1) Look at the top of the template for the token comment containing the charset. If not found,
2) look for the app wide config option for template encoding. If found use the encoding for reading the template, If not found use the vm's default "file.encoding".
3) Read the template in with the determined encoding

This way one could have all the templates in a given encoding and unless there was <!-- ### clay:page charset="UTF-8" / --> directive at the top of the file, they would be read with the default encoding set in web.xml and one wouldn't have to define this config in each and every template file. If no <!-- ### clay:page charset="UTF-8" / --> directive was defined in web.xml (null) then clay would fall back to vm'a default file.encoding

How about this?

Gary VanMatre added a comment - 28/Sep/06 02:22 AM
This is the first try at resolving this issue. It will be available in the shale-framework-20060928 nightly build. You can find it here: http://people.apache.org/builds/shale/nightly/.

To summarize the changes based on your notes the encoding is now determined with the following steps:

1) Look at the top of the template for the token comment containing the charset.

 For example: <-- ### clay:page charset="UTF-8" /### -->

2) If not found, look for the app wide config option for template encoding. If found use the encoding for reading the template.

For example:
  <context-param>
    <param-name>org.apache.shale.clay.HTML_TEMPLATE_CHARSET</param-name>
    <param-value>UTF-8</param-value>
  </context-param>

3) If not found use the vm's default "file.encoding".

4) Read the template in with the determined encoding

Tom, I'm going to leave this open until you have a chance to verify.



Tom Pasierb added a comment - 29/Sep/06 08:36 PM
I have tried the updated clay version with html templates.

I experimented with -Dfile.encoding (system encoding setting), org.apache.shale.clay.HTML_TEMPLATE_CHARSET context init parameter and <-- ### clay:page charset="UTF-8" /### --> and everything works as expected so I guess this issue can be closed.

Thanks Gary :-)

Gary VanMatre added a comment - 16/Oct/06 02:40 AM
The examples and your input really made the difference on this one. Thanks for the help Tom.