Velocity
  1. Velocity
  2. VELOCITY-661

Parsing errors on content inside #literal() #end block

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.6.1
    • Fix Version/s: 1.7
    • Component/s: Engine
    • Labels:
      None
    • Environment:
      ALL

      Description

      I have some velocity templates that include quit some javascript. Inside the javascript a javascrip template engine is used which also uses $

      {varname}

      Escaping each occurance would make the code rather unreable, so to prevent velocity from parsing the javascript code, I put a #literal() around it.

      However, velocity still PARSES the contents of this block, which of course results in parsing exceptions.

      My feeling with "literal" is that it is completely UNINTERPRETED content?

      This SHOULD work:

      #literal()
      var myId = 'someID';
      $('#test).append($.template('<div id="$

      {myId}

      "></div>').apply(

      {myId: myId}

      ));
      #end

      1. velocity-661-v1.0.patch
        14 kB
        Jarkko Viinamäki

        Issue Links

          Activity

          Hide
          Niki Driessen added a comment -

          Thanks guys for all the effort, in the end, I don't really care if I have to type #[[ #%% or #(( , so whatever worked technically is perfect
          Good point about the required space cause I personnally tend to type (and a lot more people I suppose for readability) :

          #[[
          my
          unparsed
          content
          which can be
          pretty long
          ]]#

          which would indeed fail if I forgot to put spaces in, and that might be very confusing/hard to spot

          Anyway, thanks alot, now just waiting for 1.7 to be released In the meanwhile, I'll try out your test build in my project and let you know how that works out (we have some 150 templates, some over 5000 lines and we're using pretty "exotic constructs" here and there so I think it's a pretty good real-life test )

          Show
          Niki Driessen added a comment - Thanks guys for all the effort, in the end, I don't really care if I have to type #[[ #%% or #(( , so whatever worked technically is perfect Good point about the required space cause I personnally tend to type (and a lot more people I suppose for readability) : #[[ my unparsed content which can be pretty long ]]# which would indeed fail if I forgot to put spaces in, and that might be very confusing/hard to spot Anyway, thanks alot, now just waiting for 1.7 to be released In the meanwhile, I'll try out your test build in my project and let you know how that works out (we have some 150 templates, some over 5000 lines and we're using pretty "exotic constructs" here and there so I think it's a pretty good real-life test )
          Hide
          Nathan Bubna added a comment -

          Ok, #[[unparsed texblock]]# is now in. Hope ya'll like it. Report BC (backward compatibility) problems ASAP. I put up a test build at:

          http://people.apache.org/~nbubna/velocity/engine/1.7-dev/

          if you don't want to build your own from svn.

          Show
          Nathan Bubna added a comment - Ok, #[ [unparsed texblock] ]# is now in. Hope ya'll like it. Report BC (backward compatibility) problems ASAP. I put up a test build at: http://people.apache.org/~nbubna/velocity/engine/1.7-dev/ if you don't want to build your own from svn.
          Hide
          Nathan Bubna added a comment -

          Ok, i have it working and settled on #[[content]]# as the syntax. I figured the vague similarity to xml's use of <![CDATA[content]]> would help make it clear what this syntax is for.

          Any thoughts before i commit?

          Show
          Nathan Bubna added a comment - Ok, i have it working and settled on #[ [content] ]# as the syntax. I figured the vague similarity to xml's use of <![CDATA [content] ]> would help make it clear what this syntax is for. Any thoughts before i commit?
          Hide
          Nathan Bubna added a comment -

          Tricky is one word for it... irritating is another. I don't know if this is due to javacc or just how we are using javacc.

          Still, i suppose #(( block ))# isn't terrible. I wonder if #{{ }}# works, since that would feel more VTL-ish to me... I should have a few minutes after lunch to try it out.

          Show
          Nathan Bubna added a comment - Tricky is one word for it... irritating is another. I don't know if this is due to javacc or just how we are using javacc. Still, i suppose #(( block ))# isn't terrible. I wonder if #{{ }}# works, since that would feel more VTL-ish to me... I should have a few minutes after lunch to try it out.
          Hide
          Jarkko Viinamäki added a comment -

          This is a bit tricky issue.

          #%%this does not work%%# (the parser chokes with #%%#%%#)

          #!Unable to render embedded object: File (this does not work either) not found.!#

          For some reason #((this works))# and even #((#))# this works too.

          So if you replace %% with (( then the whitespace is not needed in the token and this works too:

          #((
          block
          ))#

          I'm not sure why %% and !! don't work.

          Show
          Jarkko Viinamäki added a comment - This is a bit tricky issue. #%%this does not work%%# (the parser chokes with #%%#%%#) #! Unable to render embedded object: File (this does not work either) not found. !# For some reason #((this works))# and even #((#))# this works too. So if you replace %% with (( then the whitespace is not needed in the token and this works too: #(( block ))# I'm not sure why %% and !! don't work.
          Hide
          Nathan Bubna added a comment -

          Hmm. It seems that the parser likes "#% " better than "#%%". With the former, it can handle #% # %# better than #%%#%%#, which i suppose isn't surprising. I'm not sure how much more time i have for this, but i'm not ready to give up on #%% %%# without a bit more effort.

          Show
          Nathan Bubna added a comment - Hmm. It seems that the parser likes "#% " better than "#%%". With the former, it can handle #% # %# better than #%%#%%#, which i suppose isn't surprising. I'm not sure how much more time i have for this, but i'm not ready to give up on #%% %%# without a bit more effort.
          Hide
          Nathan Bubna added a comment -

          Hmm. I don't think "#% " is going to be that much more common, but i've also started thinking that the required space might be confusing if someone tries:

          #%
          block
          %#

          which is very likely, i think. The extra space also makes "#% " a challenge for both writer and reader of documentation. So, i think we need to do either #% or #%%, without the space. #%% is definitely less likely, but really, the only situations i can think of for #% are "fake cussing" like #%@$! or maybe some regex. In both of those already unusual cases, #%% seems much less likely.

          I do prefer the way that #% block %# looks, but even #%% block %%# looks a lot better than the current workarounds. I think i'll adapt the patch to do the latter one, as i'd rather not have any fake cussing lead to the real deal. Also, the latter would be much more forward compatible if we later decide to cut it back to #% block %# for aesthetic (or other) reasons.

          Show
          Nathan Bubna added a comment - Hmm. I don't think "#% " is going to be that much more common, but i've also started thinking that the required space might be confusing if someone tries: #% block %# which is very likely, i think. The extra space also makes "#% " a challenge for both writer and reader of documentation. So, i think we need to do either #% or #%%, without the space. #%% is definitely less likely, but really, the only situations i can think of for #% are "fake cussing" like #%@$! or maybe some regex. In both of those already unusual cases, #%% seems much less likely. I do prefer the way that #% block %# looks, but even #%% block %%# looks a lot better than the current workarounds. I think i'll adapt the patch to do the latter one, as i'd rather not have any fake cussing lead to the real deal. Also, the latter would be much more forward compatible if we later decide to cut it back to #% block %# for aesthetic (or other) reasons.
          Hide
          Jarkko Viinamäki added a comment -

          OTOH, I started to think how likely is it that some templates of existing applications contain the "#% " pattern? This is more problematic than the end token since this token starts the textblock and may break rendering.

          So #%% %%# would be definitely a safer solution although it's longer and not so pretty. Heh. So I ended up advocating both approaches. Technically both are equally easy to implement and there should not be difference in performance either.

          I trust Nathan will make the right choice.

          Show
          Jarkko Viinamäki added a comment - OTOH, I started to think how likely is it that some templates of existing applications contain the "#% " pattern? This is more problematic than the end token since this token starts the textblock and may break rendering. So #%% %%# would be definitely a safer solution although it's longer and not so pretty. Heh. So I ended up advocating both approaches. Technically both are equally easy to implement and there should not be difference in performance either. I trust Nathan will make the right choice.
          Hide
          Nathan Bubna added a comment -

          Good point about the two halves workaround. And if #% block %# is simpler on the implementation side, then let's stick with it. I'll see if i can find some time this afternoon to test this and check it in, as it'll take a little while to get you karma even after your CLA is in.

          Show
          Nathan Bubna added a comment - Good point about the two halves workaround. And if #% block %# is simpler on the implementation side, then let's stick with it. I'll see if i can find some time this afternoon to test this and check it in, as it'll take a little while to get you karma even after your CLA is in.
          Hide
          Jarkko Viinamäki added a comment -

          Thanks for the comments. I don't have the energy to start building separate patch files of those changes right now. Nathan, I'll get back to you about those CLA issues.

          I experimented with different token types and indeed at one point I considered using "#%% " (forgot to update the JavaDoc) but it's not as simple as "#% " which I find better. It's consistent with the regular comment format #* this is a comment *#

          If you need to include " %#" (notice that it has to contain the leading whitespace to cause "problems") inside #% block %#, then you can easily work around it e.g. by dividing the textblock in half:

          #% this is the first part %# %# #% this is the second part %#

          will result in
          this is the first part %# this is the second part

          Note that it's perfectly ok to write: #% this is a comment with stuff%# in it %#

          I'd guess that " %#" is unlikely enough. Anyway, it's a matter of taste and both work. I just like simplicity.

          Show
          Jarkko Viinamäki added a comment - Thanks for the comments. I don't have the energy to start building separate patch files of those changes right now. Nathan, I'll get back to you about those CLA issues. I experimented with different token types and indeed at one point I considered using "#%% " (forgot to update the JavaDoc) but it's not as simple as "#% " which I find better. It's consistent with the regular comment format #* this is a comment *# If you need to include " %#" (notice that it has to contain the leading whitespace to cause "problems") inside #% block %#, then you can easily work around it e.g. by dividing the textblock in half: #% this is the first part %# %# #% this is the second part %# will result in this is the first part %# this is the second part Note that it's perfectly ok to write: #% this is a comment with stuff%# in it %# I'd guess that " %#" is unlikely enough. Anyway, it's a matter of taste and both work. I just like simplicity.
          Hide
          Niki Driessen added a comment -

          Thanks alot guys for all the feedback and suggested solutions!

          The solution with #%% this is not parsed at all %%# looks perfect to me (and consistent like Nathan mentioned in the comment before me)
          Indeed, should anyone still need #%% sequence in that block (I can not immediately point this to any syntax I know off, so prob quite unlikely) there are work arounds like with the #include('bla.txt') I'm using now

          Can someone give me a pointer as to which "stable" release this fix could go in? I need to run this in a secure production environment so can't really start patching libraries myself...

          Show
          Niki Driessen added a comment - Thanks alot guys for all the feedback and suggested solutions! The solution with #%% this is not parsed at all %%# looks perfect to me (and consistent like Nathan mentioned in the comment before me) Indeed, should anyone still need #%% sequence in that block (I can not immediately point this to any syntax I know off, so prob quite unlikely) there are work arounds like with the #include('bla.txt') I'm using now Can someone give me a pointer as to which "stable" release this fix could go in? I need to run this in a secure production environment so can't really start patching libraries myself...
          Hide
          Nathan Bubna added a comment -

          No, let's keep it simple. If people really need the end token in the middle, their are plenty of workarounds. No need to complicate our parser for such an unusual corner case.

          I did notice in the ASTTextblock javadoc that at one point you had the sequence as:

          #%% this is included but not parsed %%#

          I actually kinda like that better since it is somewhat parallel too:

              • comments that are removed **#

          and is also an even less likely sequence.

          Show
          Nathan Bubna added a comment - No, let's keep it simple. If people really need the end token in the middle, their are plenty of workarounds. No need to complicate our parser for such an unusual corner case. I did notice in the ASTTextblock javadoc that at one point you had the sequence as: #%% this is included but not parsed %%# I actually kinda like that better since it is somewhat parallel too: comments that are removed **# and is also an even less likely sequence.
          Hide
          Byron Foster added a comment -

          I like this allot better then the current solution. However, you have the problem of not being to have %# in the block, not a big deal. Maybe a future addition would be to add an optional keyword in, like:

          #end%
          <stuff>
          %end#

          Show
          Byron Foster added a comment - I like this allot better then the current solution. However, you have the problem of not being to have %# in the block, not a big deal. Maybe a future addition would be to add an optional keyword in, like: #end% <stuff> %end#
          Hide
          Nathan Bubna added a comment -

          The patch looks pretty good, though i do prefer separate patches for separate features/fixes. Anyway, i'm in favor of this. It would make #literal rather useless, but that's not a bad thing. Once this is in, we should probably deprecate #literal and replace its documentation sections with documentation of this.

          It would be nice if you could just check these things in yourself (in separate commits, ideally). Any progress on getting a CLA in?

          Show
          Nathan Bubna added a comment - The patch looks pretty good, though i do prefer separate patches for separate features/fixes. Anyway, i'm in favor of this. It would make #literal rather useless, but that's not a bad thing. Once this is in, we should probably deprecate #literal and replace its documentation sections with documentation of this. It would be nice if you could just check these things in yourself (in separate commits, ideally). Any progress on getting a CLA in?
          Hide
          Jarkko Viinamäki added a comment -

          OK. Here's my attempt to fix this issue.

          This patch introduces a new token type which I call "Textblock". Textblocks can be marked like this:

          #% here you can have any text, characters etc. %#

          Velocity will not attempt to parse anything between "#% " and " %#" and will output content between these tokens as such. Note that one whitespace is part of each token.

          With this feature it is easy to deal with issues described e.g. in VELOCITY-584

          Reasons for selecting this token format:
          1. It's short and thus easy to use.
          2. It is quite unlikely that you need to have string " %#" inside the area you want to "escape" with a Textblock. Thus there is no need for a custom end token (which would have been tricky to implement). Note that you can have %# inside the area as long as it doesn't have whitespace in front of it.

          This patch also includes some performance tweaks made to the parser:
          1. Transformation StringBuffer -> StrBuilder is done with Ant script after the generation.
          2. Parser state is stored using a custom inner class instead of Hashtable which wastes space.

          Note that I'm a total newbie with JavaCC (I took a closer look at it today for the first time) so the parser might not be as efficient as possible. I had to use the JavaCC MORE feature when in IN_TEXTBLOCK mode. I'm not sure if this is the only way.

          NOTICE! You must use "ant parser" command to process the Parser.jjt which then generates Parser.java etc. I didn't include the generated files in this patch since TortoiseSVN complained something about invalid linefeeds. I noticed that JavaCC 4.2 is incompatible with existing classes. JavaCC 4.0 worked OK.

          Tweaks to the build.xml:
          1. Now it's possible to execute just one Test Case by defining the test case name:
          ant -Dtestcase=org.apache.velocity.test.TextblockTestCase test

          This makes debugging a lot faster.

          2. Generated ParserTokenManager is tweaked to use StrBuilder by simple search & replace.

          Show
          Jarkko Viinamäki added a comment - OK. Here's my attempt to fix this issue. This patch introduces a new token type which I call "Textblock". Textblocks can be marked like this: #% here you can have any text, characters etc. %# Velocity will not attempt to parse anything between "#% " and " %#" and will output content between these tokens as such. Note that one whitespace is part of each token. With this feature it is easy to deal with issues described e.g. in VELOCITY-584 Reasons for selecting this token format: 1. It's short and thus easy to use. 2. It is quite unlikely that you need to have string " %#" inside the area you want to "escape" with a Textblock. Thus there is no need for a custom end token (which would have been tricky to implement). Note that you can have %# inside the area as long as it doesn't have whitespace in front of it. This patch also includes some performance tweaks made to the parser: 1. Transformation StringBuffer -> StrBuilder is done with Ant script after the generation. 2. Parser state is stored using a custom inner class instead of Hashtable which wastes space. Note that I'm a total newbie with JavaCC (I took a closer look at it today for the first time) so the parser might not be as efficient as possible. I had to use the JavaCC MORE feature when in IN_TEXTBLOCK mode. I'm not sure if this is the only way. NOTICE! You must use "ant parser" command to process the Parser.jjt which then generates Parser.java etc. I didn't include the generated files in this patch since TortoiseSVN complained something about invalid linefeeds. I noticed that JavaCC 4.2 is incompatible with existing classes. JavaCC 4.0 worked OK. Tweaks to the build.xml: 1. Now it's possible to execute just one Test Case by defining the test case name: ant -Dtestcase=org.apache.velocity.test.TextblockTestCase test This makes debugging a lot faster. 2. Generated ParserTokenManager is tweaked to use StrBuilder by simple search & replace.
          Hide
          Christoph Reck added a comment -

          To avoid to have to make to many changes to the parser, maybe the interface or semantics of directives can be enhanced with a feature to tell the parser to go into a "simple" mode where it skips parsing into AST until the end marker is reached.

          It would be nice to be able to tell it what the end marker is, similar to the suggestion above (I changed it to contain the "#"):
          #literal('#litend')
          ...#end...
          #litend

          To leave a single limitation that requires avoiding an "#end" within the block is only partially acceptable.
          #set( $H = '#' )
          #literal()
          ...$

          {H}end...
          #end
          Custom directives that do their own magic with the block content would not know what to do with an embedded "${H}

          ", which would require some pre-processing/rendering of the block before using it (interesting idea... would allow some dynamically velocity created content before directive activity). Alternatively, the custom directive could allow its own escaping syntax.

          Show
          Christoph Reck added a comment - To avoid to have to make to many changes to the parser, maybe the interface or semantics of directives can be enhanced with a feature to tell the parser to go into a "simple" mode where it skips parsing into AST until the end marker is reached. It would be nice to be able to tell it what the end marker is, similar to the suggestion above (I changed it to contain the "#"): #literal('#litend') ...#end... #litend To leave a single limitation that requires avoiding an "#end" within the block is only partially acceptable. #set( $H = '#' ) #literal() ...$ {H}end... #end Custom directives that do their own magic with the block content would not know what to do with an embedded "${H} ", which would require some pre-processing/rendering of the block before using it (interesting idea... would allow some dynamically velocity created content before directive activity). Alternatively, the custom directive could allow its own escaping syntax.
          Hide
          Byron Foster added a comment -

          Yea, the issue's creator suggested purpose of #literal is much more useful, and the docs do gloss over allot of limitations as mentioned above, and also I'm afraid to test what it does to whitespace, which will be another surprise.

          Moving #literal in this direction would require more syntax, because of the obvious case of something like:

          #literal()
          #end
          #end

          Maybe something like:

          #literal("litend")
          #end
          #litend

          Or maybe something simpler and fixed like

          #literal
          #end
          #literal_end

          such that #literal_end would always end a literal, you could just never have this in a literal block. This solution would also be allot easier to add to the parser.

          Show
          Byron Foster added a comment - Yea, the issue's creator suggested purpose of #literal is much more useful, and the docs do gloss over allot of limitations as mentioned above, and also I'm afraid to test what it does to whitespace, which will be another surprise. Moving #literal in this direction would require more syntax, because of the obvious case of something like: #literal() #end #end Maybe something like: #literal("litend") #end #litend Or maybe something simpler and fixed like #literal #end # literal_end such that # literal_end would always end a literal, you could just never have this in a literal block. This solution would also be allot easier to add to the parser.
          Hide
          Nathan Bubna added a comment -

          Well, "interpret" is apparently not the clearest of terms, and the docs could use some improvement (no surprise there). Still, nothing within a #literal directive is interpreted, in the sense that it is rendered literally, as the examples show. However, #literal is not a special directive in any sense. The parser hasn't a clue what #literal intends to do with its content. So, everything within a #literal block is still parsed.

          Like i said, #literal was never designed to prevent parsing of content. #literal was to make it easy to escape large sections of valid VTL. like most escaping issues people have with Velocity, they expect to escape invalid VTL, but all of Velocity's escaping mechanisms (#literal included) were designed to escape valid VTL only. This is something i think everyone is ready to change in 2.0, but that doesn't mean it wasn't intentional when it was first done (not by me, i might add .

          So, to change the #literal directive to make the parser be lenient about its content, you will have to hardwire knowledge about it into the parser itself. I'm willing to help commit and test such a solution, but i really don't have time to develop it myself.

          Show
          Nathan Bubna added a comment - Well, "interpret" is apparently not the clearest of terms, and the docs could use some improvement (no surprise there). Still, nothing within a #literal directive is interpreted, in the sense that it is rendered literally, as the examples show. However, #literal is not a special directive in any sense. The parser hasn't a clue what #literal intends to do with its content. So, everything within a #literal block is still parsed. Like i said, #literal was never designed to prevent parsing of content. #literal was to make it easy to escape large sections of valid VTL. like most escaping issues people have with Velocity, they expect to escape invalid VTL, but all of Velocity's escaping mechanisms (#literal included) were designed to escape valid VTL only. This is something i think everyone is ready to change in 2.0, but that doesn't mean it wasn't intentional when it was first done (not by me, i might add . So, to change the #literal directive to make the parser be lenient about its content, you will have to hardwire knowledge about it into the parser itself. I'm willing to help commit and test such a solution, but i really don't have time to develop it myself.
          Hide
          Jarkko Viinamäki added a comment -

          I think Velocity should not interpret anything inside the literal directive. In fact, if you look at the documentation, that is the behaviour one would naturally expect:

          "the #literal script element allows the template designer to easily use large chunks of uninterpreted content in VTL code"
          http://velocity.apache.org/engine/devel/user-guide.html#stringliterals

          What would be an easy way to change the literal directive so that Velocity does not interpret anything inside it?

          Show
          Jarkko Viinamäki added a comment - I think Velocity should not interpret anything inside the literal directive. In fact, if you look at the documentation, that is the behaviour one would naturally expect: "the #literal script element allows the template designer to easily use large chunks of uninterpreted content in VTL code" http://velocity.apache.org/engine/devel/user-guide.html#stringliterals What would be an easy way to change the literal directive so that Velocity does not interpret anything inside it?
          Hide
          Nathan Bubna added a comment -

          #literal wasn't designed with that in mind. The parser only knows that #literal is a directive, not that it should treat it specially. Still, while not a bug, this would be a better way for #literal to work.

          Typical practice for your situation is to either "poor man's escaping":

          #set( $D = '$' )
          #set( $H = '#' )## or use EscapeTool: $esc.h and $esc.d
          var myId = 'someId';
          $D('$

          {H}

          test').append($

          {D}

          .template('<div id="$D

          {myId}

          "></div>').apply(

          {myId: myId}

          ));

          or to put long sections into a separate file and use #include( 'myscript.txt' ) to bring it back into the template. #include most definitely does not parse the files it brings in.

          Show
          Nathan Bubna added a comment - #literal wasn't designed with that in mind. The parser only knows that #literal is a directive, not that it should treat it specially. Still, while not a bug, this would be a better way for #literal to work. Typical practice for your situation is to either "poor man's escaping": #set( $D = '$' ) #set( $H = '#' )## or use EscapeTool: $esc.h and $esc.d var myId = 'someId'; $D('$ {H} test').append($ {D} .template('<div id="$D {myId} "></div>').apply( {myId: myId} )); or to put long sections into a separate file and use #include( 'myscript.txt' ) to bring it back into the template. #include most definitely does not parse the files it brings in.

            People

            • Assignee:
              Unassigned
              Reporter:
              Niki Driessen
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development