Velocity
  1. Velocity
  2. VELOCITY-519

Java escape sequences should work in Velocity macros

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Later
    • Affects Version/s: 1.5 beta2
    • Fix Version/s: 2.x
    • Component/s: Engine
    • Labels:
      None

      Description

      Following test should work:

      ===
      public void testJavaEscape() throws Exception

      { VelocityEngine ve = new VelocityEngine(); ve.init(); Context context = new VelocityContext(); StringWriter writer = new StringWriter(); ve.evaluate(context, writer, "test","#set($v = \"\\u0061\")$v"); assertEquals("a", writer.toString()); writer = new StringWriter(); ve.evaluate(context, writer, "test","#set($v = \"\\n\")$v"); assertEquals("\n", writer.toString()); }

      ===

        Issue Links

          Activity

          Hide
          Nathan Bubna added a comment -

          Sorry Stepan, VTL (Velocity Template Language) is not java, nor do we have any intention of making it so. It is designed to be a simple templating language with a small feature set that is quick for anyone to learn. We have several times in the past debated supporting various escaping schemes in string literal definitions (that's what your example is about, not macros), but the only escaping feature that managed to gather a consensus among the developers was MySQL like quote and double quote escaping and even that has not been added yet. Search the archives for more history on this.

          Do your escaping in Java or create a tool (or even a patch for this: http://velocity.apache.org/tools/devel/javadoc/org/apache/velocity/tools/generic/EscapeTool.html) to support doing this within templates.

          Personally, the soonest i would be interested in re-opening discussion on escaping within string definitions would be when work has started on Velocity 2.0. Until then, this gets a -1 from me.

          Show
          Nathan Bubna added a comment - Sorry Stepan, VTL (Velocity Template Language) is not java, nor do we have any intention of making it so. It is designed to be a simple templating language with a small feature set that is quick for anyone to learn. We have several times in the past debated supporting various escaping schemes in string literal definitions (that's what your example is about, not macros), but the only escaping feature that managed to gather a consensus among the developers was MySQL like quote and double quote escaping and even that has not been added yet. Search the archives for more history on this. Do your escaping in Java or create a tool (or even a patch for this: http://velocity.apache.org/tools/devel/javadoc/org/apache/velocity/tools/generic/EscapeTool.html ) to support doing this within templates. Personally, the soonest i would be interested in re-opening discussion on escaping within string definitions would be when work has started on Velocity 2.0. Until then, this gets a -1 from me.
          Hide
          Will Glass-Husain added a comment -

          I don't see this something worth spending time on myself, but I think I'd accept a patch to add this in.

          It's a pretty basic functionality, and a fairly isolated change to the java cc tokenizing code. I'd want to see something comprehensive (e.g. supporting \n \r as well as unicode) before committing.

          It would help Stepan if you could give some use cases describing why this is needed and what problems it would solve.

          Show
          Will Glass-Husain added a comment - I don't see this something worth spending time on myself, but I think I'd accept a patch to add this in. It's a pretty basic functionality, and a fairly isolated change to the java cc tokenizing code. I'd want to see something comprehensive (e.g. supporting \n \r as well as unicode) before committing. It would help Stepan if you could give some use cases describing why this is needed and what problems it would solve.
          Hide
          Nathan Bubna added a comment -

          I don't see how this could be done in a backwards compatible way. I don't want to be pigheaded about this, but i really think this doesn't belong in a 1.x version. Of course, there's no reason someone can't start working on 2.0 now...

          Show
          Nathan Bubna added a comment - I don't see how this could be done in a backwards compatible way. I don't want to be pigheaded about this, but i really think this doesn't belong in a 1.x version. Of course, there's no reason someone can't start working on 2.0 now...
          Hide
          Stepan Koltsov added a comment -

          Nathan, EscapeTool does escaping. I need something opposite.

          I need ability to insert any Unicode character in template. Any programming language, templating language or markup language allows this. Velocity shoud too.

          Inserting concrete characters is not unescaping.

          My templates are stored in encoding that is not Unicode. So it is not possible to insert some characters I have to insert (some unicode characters, like em dash). In Java I can write: \u2014, in HTML: —. I want to have something similar in Velocity.

          I think it is bad idea to have special tool to generate characters.

          Show
          Stepan Koltsov added a comment - Nathan, EscapeTool does escaping. I need something opposite. I need ability to insert any Unicode character in template. Any programming language, templating language or markup language allows this. Velocity shoud too. Inserting concrete characters is not unescaping. My templates are stored in encoding that is not Unicode. So it is not possible to insert some characters I have to insert (some unicode characters, like em dash). In Java I can write: \u2014, in HTML: —. I want to have something similar in Velocity. I think it is bad idea to have special tool to generate characters.
          Hide
          Stepan Koltsov added a comment -

          Trivial patch, not backward compatible.

          Show
          Stepan Koltsov added a comment - Trivial patch, not backward compatible.
          Hide
          Christopher Schultz added a comment -

          Stepan,
          The only reason that Java needs unicode escaping in source files is because java source files are defined to be ISO-8859-1. You simply cannot put higher characters like Kanji into a Java source file, hence the \u1234 escape sequences. Same thing with properties files.

          Velocity template files have no such restrictions IIRC. Why not simply use UTF-8 encoding and put your special characters directly into your template files? There's really no need for escaping of these kinds of things.

          Now, newline escaping is another story, unless there is a non '\n' (or '\r') newline character that I don't know about.

          Show
          Christopher Schultz added a comment - Stepan, The only reason that Java needs unicode escaping in source files is because java source files are defined to be ISO-8859-1. You simply cannot put higher characters like Kanji into a Java source file, hence the \u1234 escape sequences. Same thing with properties files. Velocity template files have no such restrictions IIRC. Why not simply use UTF-8 encoding and put your special characters directly into your template files? There's really no need for escaping of these kinds of things. Now, newline escaping is another story, unless there is a non '\n' (or '\r') newline character that I don't know about.
          Hide
          Stepan Koltsov added a comment -

          Another patch, that enables only \u escapes, in backward compatible way

          Show
          Stepan Koltsov added a comment - Another patch, that enables only \u escapes, in backward compatible way
          Hide
          Stepan Koltsov added a comment -

          Christopher,

          It is not easy to work with some UTF-8 characters in Velocity template. For example, it is hard to maintain code with em dash in text editor with fixed width font. With some unicode characters, like non-breakable space, it is not possible to work in the most text editors.

          BTW, you are not right, that java mandates ISO-8859-1:

          [yozh@PowerBook:...ft/velocity/engine-trunk]% javac -help |& grep encoding
          -encoding <encoding> Specify character encoding used by source files

          Another reason for unicode escapes is that my company mandates that all source files must be in windows-1251.

          I personally don't need \r or \n escapes, but I think they should be enabled in version 2.0.

          Show
          Stepan Koltsov added a comment - Christopher, It is not easy to work with some UTF-8 characters in Velocity template. For example, it is hard to maintain code with em dash in text editor with fixed width font. With some unicode characters, like non-breakable space, it is not possible to work in the most text editors. BTW, you are not right, that java mandates ISO-8859-1: [yozh@PowerBook:...ft/velocity/engine-trunk] % javac -help |& grep encoding -encoding <encoding> Specify character encoding used by source files Another reason for unicode escapes is that my company mandates that all source files must be in windows-1251. I personally don't need \r or \n escapes, but I think they should be enabled in version 2.0.
          Hide
          Nathan Bubna added a comment -

          Thanks, i'll let Will take responsibility for the unicode escapes patch if he's willing. If not, i guess i'd be willing to do it since it's better than the current bug.

          As for the line breaks, i don't see why anyone should need them when the can just put the actual carriage return or new line right into the string.

          Show
          Nathan Bubna added a comment - Thanks, i'll let Will take responsibility for the unicode escapes patch if he's willing. If not, i guess i'd be willing to do it since it's better than the current bug. As for the line breaks, i don't see why anyone should need them when the can just put the actual carriage return or new line right into the string.
          Hide
          Will Glass-Husain added a comment -

          Stepan makes good points (I believe) about the utility of including escaped unicode characters. Actually, I face similar issues from time to time. My main text editor doesn't do a good job with UTF-8.

          I advocate \n, \r, and more importantly, \t for comprehensiveness. If we allow unicode escaping then users will expect the other items. Are there more that need to be included?

          I can imagine cases where inserting a tab character might be useful. For example, you might want to compare a string to \t in an #if statement. The following

          #if($samplestring.contains("\t"))

          is more readable than

          #if($samplestring.contains("\t"))

          You could do this with a tool, but this is simpler syntax.

          WILL

          Show
          Will Glass-Husain added a comment - Stepan makes good points (I believe) about the utility of including escaped unicode characters. Actually, I face similar issues from time to time. My main text editor doesn't do a good job with UTF-8. I advocate \n, \r, and more importantly, \t for comprehensiveness. If we allow unicode escaping then users will expect the other items. Are there more that need to be included? I can imagine cases where inserting a tab character might be useful. For example, you might want to compare a string to \t in an #if statement. The following #if($samplestring.contains("\t")) is more readable than #if($samplestring.contains("\t")) You could do this with a tool, but this is simpler syntax. WILL
          Hide
          Will Glass-Husain added a comment -

          Been thinking-- is this backwards compatible? I'm not sure.

          Mostly, it is. But if a user is generating Java code and expects the escape sequences to pass through, then we're in trouble.

          Any thoughts?

          Show
          Will Glass-Husain added a comment - Been thinking-- is this backwards compatible? I'm not sure. Mostly, it is. But if a user is generating Java code and expects the escape sequences to pass through, then we're in trouble. Any thoughts?
          Hide
          Nathan Bubna added a comment -

          No, it's not backwards compatible. The unicode escapes are merely by virtue of the fact that they caused an error previously. \n \r and \t are not backwards compatible. And if you want to insert one of those characters, just insert them in the string. There's no need to enter it in escaped form and then make Velocity convert for you when you can just put the character in the string yourself. No tools or patches necessary here, and definitely no need to break backwards compatibility.

          Show
          Nathan Bubna added a comment - No, it's not backwards compatible. The unicode escapes are merely by virtue of the fact that they caused an error previously. \n \r and \t are not backwards compatible. And if you want to insert one of those characters, just insert them in the string. There's no need to enter it in escaped form and then make Velocity convert for you when you can just put the character in the string yourself. No tools or patches necessary here, and definitely no need to break backwards compatibility.
          Hide
          Will Glass-Husain added a comment -

          That's a good point. If the \u00b1 format didn't work at all before, there's no harm in adding it in. And it adds new capability. (inserting characters into the body that you might not have been able to before).

          But if \t, \r, \n were were just passed through (and possibly this was desired by someone generating Java code) we shouldn't intercept it. So I withdraw my suggestion for \t, \r, \n parsing.

          Show
          Will Glass-Husain added a comment - That's a good point. If the \u00b1 format didn't work at all before, there's no harm in adding it in. And it adds new capability. (inserting characters into the body that you might not have been able to before). But if \t, \r, \n were were just passed through (and possibly this was desired by someone generating Java code) we shouldn't intercept it. So I withdraw my suggestion for \t, \r, \n parsing.
          Hide
          Will Glass-Husain added a comment -

          assigning to 1.6 since there's consensus we should do something here.

          Show
          Will Glass-Husain added a comment - assigning to 1.6 since there's consensus we should do something here.
          Hide
          Christopher Schultz added a comment -

          If \uXXXX syntax works, then it's still possible to insert newlines (\u000a and \u000c) and tabs (\u0009), even if the escape sequences aren't as recognizable.

          Another option would be to fix bug VELOCITY-520 such that \uXXXX escapes do not cause an error any longer, and then create a macro that takes a string and converts \u and \r, \n, \t, etc. escape sequences into their actual characters.

          Show
          Christopher Schultz added a comment - If \uXXXX syntax works, then it's still possible to insert newlines (\u000a and \u000c) and tabs (\u0009), even if the escape sequences aren't as recognizable. Another option would be to fix bug VELOCITY-520 such that \uXXXX escapes do not cause an error any longer, and then create a macro that takes a string and converts \u and \r, \n, \t, etc. escape sequences into their actual characters.
          Hide
          Christoph Reck added a comment -

          If inserting & automatic translation of unicode characters work, the template writer might predefine and use $tab, $CR, $LF constants instead of \t, \r \n thus keeping BC.

          #set( $tab = "\u0009" )
          ...
          #if( $samplestring.contains($tab) )

          P.S. in the past I used the UrlDecoder to achieve this.

          Show
          Christoph Reck added a comment - If inserting & automatic translation of unicode characters work, the template writer might predefine and use $tab, $CR, $LF constants instead of \t, \r \n thus keeping BC. #set( $tab = "\u0009" ) ... #if( $samplestring.contains($tab) ) P.S. in the past I used the UrlDecoder to achieve this.
          Hide
          Nathan Bubna added a comment -

          The only consensus here is that we should fix the lexer error when \uXXXX appears in a string definition. This would be Stepan's second attached patch.

          And yes, this will allow you to insert tabs, carriage returns and line feeds by typing six characters for each instead of just typing the one tab, CR or LF character. why do:
          #set( $tab = "\u0009" )
          which doesn't even work yet, when
          #set( $tab = " " )
          already works?

          Show
          Nathan Bubna added a comment - The only consensus here is that we should fix the lexer error when \uXXXX appears in a string definition. This would be Stepan's second attached patch. And yes, this will allow you to insert tabs, carriage returns and line feeds by typing six characters for each instead of just typing the one tab, CR or LF character. why do: #set( $tab = "\u0009" ) which doesn't even work yet, when #set( $tab = " " ) already works?
          Hide
          Nathan Bubna added a comment -

          Since VELOCITY-520 is fixed using the backwards compatible unicode-escape only patch, i'm going to resolve this as LATER. We can revisit debate about other Java escape sequences when we work on version 2.0.

          Show
          Nathan Bubna added a comment - Since VELOCITY-520 is fixed using the backwards compatible unicode-escape only patch, i'm going to resolve this as LATER. We can revisit debate about other Java escape sequences when we work on version 2.0.
          Hide
          Will Glass-Husain added a comment -

          Nice work, Nathan (and kudos to Stepan) on VELOCITY-520.

          I seem to be the only one on this thread who liked the idea of \t, \r, \n. But after thinking about it, I'm worried about compatibility of templates that specify those symbols when generating Java code, so I withdraw my suggestion.

          (or at least, I agree we should not do this for any 1.x releases).

          Show
          Will Glass-Husain added a comment - Nice work, Nathan (and kudos to Stepan) on VELOCITY-520 . I seem to be the only one on this thread who liked the idea of \t, \r, \n. But after thinking about it, I'm worried about compatibility of templates that specify those symbols when generating Java code, so I withdraw my suggestion. (or at least, I agree we should not do this for any 1.x releases).

            People

            • Assignee:
              Unassigned
              Reporter:
              Stepan Koltsov
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development