Thrift
  1. Thrift
  2. THRIFT-153

Invalid code generated for string constants containing single quotes (')

    Details

    • Patch Info:
      Patch Available

      Description

      If I declare a constant in a Thrift iDL like this:

      const string HAMMER_TIME = "Can't touch this";

      The generated Python sources generate a constant like this:

      HAMMER_TIME = 'Can't touch this';

      This doesn't work because the apostrophe terminates the string constant.

      Since Python supports string constants using either ' or ", an easy fix is just to use " around Python constants instead:

      Index: compiler/cpp/src/generate/t_py_generator.cc
      ===================================================================
      — compiler/cpp/src/generate/t_py_generator.cc (revision 701711)
      +++ compiler/cpp/src/generate/t_py_generator.cc (working copy)
      @@ -363,7 +363,7 @@
      t_base_type::t_base tbase = ((t_base_type*)type)->get_base();
      switch (tbase) {
      case t_base_type::TYPE_STRING:

      • out << "'" << value->get_string() << "'";
        + out << '"' << value->get_string() << '"';
        break;
        case t_base_type::TYPE_BOOL:
        out << (value->get_integer() > 0 ? "True" : "False");

        Activity

        Hide
        David Reiss added a comment -

        Tentatively marking this resolved. Feel free to reopen if I broke something.

        Show
        David Reiss added a comment - Tentatively marking this resolved. Feel free to reopen if I broke something.
        Hide
        David Reiss added a comment -

        Yeah, I thought that felt a little sketchy.

        Show
        David Reiss added a comment - Yeah, I thought that felt a little sketchy.
        Hide
        Spiros Eliopoulos added a comment -

        I tried compiling with dreiss' patch and got the following error:

        thriftl.ll: In function 'int yylex()':
        thriftl.ll:263: error: 'mark' cannot appear in a constant-expression

        The attached patch fixes this problem by putting the mark case in default.

        Show
        Spiros Eliopoulos added a comment - I tried compiling with dreiss' patch and got the following error: thriftl.ll: In function 'int yylex()': thriftl.ll:263: error: 'mark' cannot appear in a constant-expression The attached patch fixes this problem by putting the mark case in default.
        Hide
        Alexander Shigin added a comment -

        Oh, there is something wrong with me in past two days. I've missed your comments with new patch at all David, sorry.

        I'll check the modeline, because I do hate the "comment fix". Perl '\n' was a big surprise for me. The absent of test/const-check is even worse. I haven't made one . It's hard for me to create test suite because I have to read a manual about automake, autoconf, etc...

        Your decision to make get_escaped_string is better than send this to const_value.

        Here a couple strange things in your patch:

        1. we don't need t_php_generator::escape_string anymore;
        2. why do you escape ' in html generator?

        So, if escaping is right it's lgtm.

        Show
        Alexander Shigin added a comment - Oh, there is something wrong with me in past two days. I've missed your comments with new patch at all David, sorry. I'll check the modeline, because I do hate the "comment fix". Perl '\n' was a big surprise for me. The absent of test/const-check is even worse. I haven't made one . It's hard for me to create test suite because I have to read a manual about automake, autoconf, etc... Your decision to make get_escaped_string is better than send this to const_value. Here a couple strange things in your patch: we don't need t_php_generator::escape_string anymore; why do you escape ' in html generator? So, if escaping is right it's lgtm.
        Hide
        David Reiss added a comment -

        I think you forgot to add the files under test/const-check

        Show
        David Reiss added a comment - I think you forgot to add the files under test/const-check
        Hide
        Alexander Shigin added a comment -

        The new version of patch.

        Show
        Alexander Shigin added a comment - The new version of patch.
        Hide
        David Reiss added a comment -

        Awesome patch. I made a bunch of small changes like adding escapes, fixing a few generators, making the call sites more uniform, etc. I also changed the way get_escaped_string is called. I like this way better, and you seemed unsure about the old way too. I also eliminated your vim comment. My guess is that you had the wrong filetype set because Vim defaults to "LifeLines" for ".ll", even though it concedes that it is also used for Lex for C++. Let me know if the modeline patch doesn't fix it.

        Show
        David Reiss added a comment - Awesome patch. I made a bunch of small changes like adding escapes, fixing a few generators, making the call sites more uniform, etc. I also changed the way get_escaped_string is called. I like this way better, and you seemed unsure about the old way too. I also eliminated your vim comment. My guess is that you had the wrong filetype set because Vim defaults to "LifeLines" for ".ll", even though it concedes that it is also used for Lex for C++. Let me know if the modeline patch doesn't fix it.
        Hide
        Alexander Shigin added a comment -

        The patch adds escaping method to generator and fixes lexer to allow strings like 'I don\'t know'. Can anybody check if I escape strings right?

        I don't know if t_const_value::get_escaped_string is a right decision.

        Show
        Alexander Shigin added a comment - The patch adds escaping method to generator and fixes lexer to allow strings like 'I don\'t know'. Can anybody check if I escape strings right? I don't know if t_const_value::get_escaped_string is a right decision.
        Show
        David Reiss added a comment - http://wiki.apache.org/thrift/EscapingVote
        Hide
        Mark Slee added a comment -

        Ah, yes, I see. Totally overlooked that. And yes, you're right, we need to make the parser also understand escaping.

        Show
        Mark Slee added a comment - Ah, yes, I see. Totally overlooked that. And yes, you're right, we need to make the parser also understand escaping.
        Hide
        Chad Walters added a comment -

        Mark, I think that Dave points out a flaw that should be addressed, though. We do need some kind of escaping in Thrift IDL or we can't represent strings with both single quotes and double quotes in them. However, as you correctly point out, this escaping should be separate from the escaping needs of the target languages.

        My recommendation:
        1. Beef up the IDL parser to allow some form of slash escaping in the constant strings. This would not be fully backwards compatible since existing '\' in strings would need to be escaped. The parser would unescape these characters as they are read in.
        2. Have the compiler perform the appropriate language-specific escaping as you suggest above.

        Show
        Chad Walters added a comment - Mark, I think that Dave points out a flaw that should be addressed, though. We do need some kind of escaping in Thrift IDL or we can't represent strings with both single quotes and double quotes in them. However, as you correctly point out, this escaping should be separate from the escaping needs of the target languages. My recommendation: 1. Beef up the IDL parser to allow some form of slash escaping in the constant strings. This would not be fully backwards compatible since existing '\' in strings would need to be escaped. The parser would unescape these characters as they are read in. 2. Have the compiler perform the appropriate language-specific escaping as you suggest above.
        Hide
        Mark Slee added a comment -

        The escaping in the Thrift file should be separate from the escaping of generated code. We should NOT be putting slashes into a .thrift file to account for how they'll be escaped in Python, for example.

        The correct fix is to change the generators on a per-language basis to properly escape the strings that they are generating.

        • out << "'" << value->get_string() << "'";
          + out << "'''" << value->get_string() << "'''";

        This should look like:

        + out << "'''" << this->escape_py_literal(value->get_string()) << "'''";

        And the method escape_*_literal should take care of proper escaping on a per-language basis.

        Show
        Mark Slee added a comment - The escaping in the Thrift file should be separate from the escaping of generated code. We should NOT be putting slashes into a .thrift file to account for how they'll be escaped in Python, for example. The correct fix is to change the generators on a per-language basis to properly escape the strings that they are generating. out << "'" << value->get_string() << "'"; + out << "'''" << value->get_string() << "'''"; This should look like: + out << "'''" << this->escape_py_literal(value->get_string()) << "'''"; And the method escape_*_literal should take care of proper escaping on a per-language basis.
        Hide
        Alexander Shigin added a comment -

        It seems so.

        And my patch should be reworked. It's uncommon to use triple single quote, but if anyone use it, he gets an error. I think it's better to replace ' to \' and use single quote as guard.

        Your string should be escaped like: 'Can\'t touch this'.

        Show
        Alexander Shigin added a comment - It seems so. And my patch should be reworked. It's uncommon to use triple single quote, but if anyone use it, he gets an error. I think it's better to replace ' to \' and use single quote as guard. Your string should be escaped like: 'Can\'t touch this'.
        Hide
        Dave Engberg added a comment -

        Ah, ok. I've only used double-quotes for string constants in Thrift. In that case, it sounds like triple-quotes is the right fix for now, thanks.

        So it sounds like there's no way to write a string constant in a Thrift IDL which contains both a single quote and a double quote character?

        Show
        Dave Engberg added a comment - Ah, ok. I've only used double-quotes for string constants in Thrift. In that case, it sounds like triple-quotes is the right fix for now, thanks. So it sounds like there's no way to write a string constant in a Thrift IDL which contains both a single quote and a double quote character?
        Hide
        Alexander Shigin added a comment -

        You have to use 'string with " double quote' and "string with ' quote" in your thrift file. You can't escape single or double quote with back slash or something like it. The literal token is described in thriftl.ll file:

        dliteral      ("\""[^"]*"\"")
        sliteral      ("'"[^']*"'")
        

        I think it should be fixed.

        Show
        Alexander Shigin added a comment - You have to use 'string with " double quote' and "string with ' quote" in your thrift file. You can't escape single or double quote with back slash or something like it. The literal token is described in thriftl.ll file: dliteral ( "\" "[^" ]* "\" ") sliteral ( "'" [^']* "'" ) I think it should be fixed.
        Hide
        Dave Engberg added a comment -

        That sounds fine to me ... although the double-quotes would have to be escaped in the original Thrift IDL, so I assumed (incorrectly?) that this escaping would carry through to the generated code.

        Show
        Dave Engberg added a comment - That sounds fine to me ... although the double-quotes would have to be escaped in the original Thrift IDL, so I assumed (incorrectly?) that this escaping would carry through to the generated code.
        Hide
        Alexander Shigin added a comment -

        It's better to use triple quote in this case, because your patch will break generated code if constant contains double quote.

        Here is my version of the patch.

        Index: compiler/cpp/src/generate/t_py_generator.cc
        ===================================================================
        --- compiler/cpp/src/generate/t_py_generator.cc	(revision 701711)
        +++ compiler/cpp/src/generate/t_py_generator.cc	(working copy)
        @@ -363,7 +363,7 @@
             t_base_type::t_base tbase = ((t_base_type*)type)->get_base();
             switch (tbase) {
             case t_base_type::TYPE_STRING:
        -      out << "'" << value->get_string() << "'";
        +      out << "'''" << value->get_string() << "'''";
               break;
             case t_base_type::TYPE_BOOL:
               out << (value->get_integer() > 0 ? "True" : "False");
        

        Sorry for double post.

        Show
        Alexander Shigin added a comment - It's better to use triple quote in this case, because your patch will break generated code if constant contains double quote. Here is my version of the patch. Index: compiler/cpp/src/generate/t_py_generator.cc =================================================================== --- compiler/cpp/src/generate/t_py_generator.cc (revision 701711) +++ compiler/cpp/src/generate/t_py_generator.cc (working copy) @@ -363,7 +363,7 @@ t_base_type::t_base tbase = ((t_base_type*)type)->get_base(); switch (tbase) { case t_base_type::TYPE_STRING: - out << "'" << value->get_string() << "'" ; + out << "'''" << value->get_string() << "'''" ; break ; case t_base_type::TYPE_BOOL: out << (value->get_integer() > 0 ? "True" : "False" ); Sorry for double post.
        Hide
        Dave Engberg added a comment -

        patch

        Show
        Dave Engberg added a comment - patch

          People

          • Assignee:
            Alexander Shigin
            Reporter:
            Dave Engberg
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development