Uploaded image for project: 'Groovy'
  1. Groovy
  2. GROOVY-8131

Statement continued onto next line is flagged when first character is "="

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.4.5
    • Fix Version/s: 2.6.0-alpha-1
    • Component/s: Compiler
    • Labels:
      None
    • Environment:
      Ubuntu Linux

      `uname -a`:
      Linux biostar 4.4.0-69-generic #90-Ubuntu SMP Thu Mar 16 16:52:31 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

      Description

      Source code attached (grbug.java).

      `javac` v8 compiles variable declarations s1, s2, and s3 successfully.

      `groovyc` flags s3:
      "unexpected token: = @ line 9, column 3."

      1. grbug.java
        0.4 kB
        Richard Elkins

        Activity

        Hide
        paulk Paul King added a comment - - edited

        Currently, this is by design rather than a bug. Groovy treats the semicolon as a statement separator not statement terminator like in Java.

        As another example, inside the collect below, there is an expression returning 4 then an expression returning the result +3 (which is just 3) and for Groovy there is an implicit return of the last expression:

        assert [3] == [1].collect {
          4
          + 3
        }
        

        Having said that, we do look ahead for closure left curly braces inside builders for instance. In theory we could enhance the grammar to handle additional cases like the equals but obviously not for all symbols (e.g. changing the '+' as per the above example would break existing code).

        Show
        paulk Paul King added a comment - - edited Currently, this is by design rather than a bug. Groovy treats the semicolon as a statement separator not statement terminator like in Java. As another example, inside the collect below, there is an expression returning 4 then an expression returning the result +3 (which is just 3) and for Groovy there is an implicit return of the last expression: assert [3] == [1].collect { 4 + 3 } Having said that, we do look ahead for closure left curly braces inside builders for instance. In theory we could enhance the grammar to handle additional cases like the equals but obviously not for all symbols (e.g. changing the '+' as per the above example would break existing code).
        Hide
        texadactyl Richard Elkins added a comment - - edited

        To me, the following two executable statements are semantically identical:

        static final SimpleDateFormat s2 =
        new SimpleDateFormat( "yyyy-MM-dd_hh:mm:ss" );

        static final SimpleDateFormat s3
        = new SimpleDateFormat( "yyyy-MM-dd_hh:mm:ss" );

        When I was a compiler constructor, newline characters (line continuation) was treated as whitespace (0x20). It doesn't matter whether the scanner employs a terminator or the scanner simply recognizes the end of a valid statement.

        When one escapes the newline character in the 1st line of the s3 declaration, then `groovyc` compiles the 2-line statement successfully.

        static final SimpleDateFormat s3m \
        = new SimpleDateFormat( "yyyy-MM-dd_hh:mm:ss" );

        If the newline is a statement separator (like semicolon), then why weren't both the s2 and s3m declarations flagged?
        1. They are declared as final yet they have no declared initialization before statement separation.
        2. The 2nd line should be a syntax error as they are assignments with nothing to assign to.

        Answer guess: Full statements are recognized semantically (terminate implicitly), regardless of the number of lines the statement occupies.

        I still think that the original report is against an undesirable feature of the `groovyc` scanner. If both lines are recognized as part of a single executable statement, then it should not matter where the '=' character falls on (line 1 or line 2). Semantically, the s2 and s3 declarations are identical.

        Am I missing some subtle philosophical point of Groovy?

        I can catch these types of differences in my Java source code fairly easily, hence, the report was classified as "minor".

        Show
        texadactyl Richard Elkins added a comment - - edited To me, the following two executable statements are semantically identical: static final SimpleDateFormat s2 = new SimpleDateFormat( "yyyy-MM-dd_hh:mm:ss" ); static final SimpleDateFormat s3 = new SimpleDateFormat( "yyyy-MM-dd_hh:mm:ss" ); When I was a compiler constructor, newline characters (line continuation) was treated as whitespace (0x20). It doesn't matter whether the scanner employs a terminator or the scanner simply recognizes the end of a valid statement. When one escapes the newline character in the 1st line of the s3 declaration, then `groovyc` compiles the 2-line statement successfully. static final SimpleDateFormat s3m \ = new SimpleDateFormat( "yyyy-MM-dd_hh:mm:ss" ); If the newline is a statement separator (like semicolon), then why weren't both the s2 and s3m declarations flagged? 1. They are declared as final yet they have no declared initialization before statement separation. 2. The 2nd line should be a syntax error as they are assignments with nothing to assign to. Answer guess: Full statements are recognized semantically (terminate implicitly), regardless of the number of lines the statement occupies. I still think that the original report is against an undesirable feature of the `groovyc` scanner. If both lines are recognized as part of a single executable statement, then it should not matter where the '=' character falls on (line 1 or line 2). Semantically, the s2 and s3 declarations are identical. Am I missing some subtle philosophical point of Groovy? I can catch these types of differences in my Java source code fairly easily, hence, the report was classified as "minor".
        Hide
        blackdrag Jochen Theodorou added a comment -

        When you make newline treated as whitespace, you cannot make it part of the syntax of the language. That is why for example in Java you have to use the semicolon to tell the compiler the statement is done. In Groovy the semicolon is optional, thus the newline character significant and you cannot treat it as whitespace anymore and it becomes a possible terminator. But it is still no terminator like the semicolon. That means

        static final SimpleDateFormat s3
        = new SimpleDateFormat( "yyyy-MM-dd_hh:mm:ss" );
        

        has a valid final first line. That is why in Groovy we suggested the logic that the code line should end with the operator to be more easily be able to tell the compiler that the next line belongs to the same statement. Such operators are for the compiler = and of course \ too. Which is why s2 and s3m are no problem.
        So terminate implicitly sounds right to me, even though I would not call that semantic recognition, since it is still only the parser that tells the compiler if a statement ends or not. We are for example not having both lines as possibly valid by their own in the parser and then combine them into a single statement somehow.

        Anyway... I think this kind of bug can be fixed in the antlr grammar

        Show
        blackdrag Jochen Theodorou added a comment - When you make newline treated as whitespace, you cannot make it part of the syntax of the language. That is why for example in Java you have to use the semicolon to tell the compiler the statement is done. In Groovy the semicolon is optional, thus the newline character significant and you cannot treat it as whitespace anymore and it becomes a possible terminator. But it is still no terminator like the semicolon. That means static final SimpleDateFormat s3 = new SimpleDateFormat( "yyyy-MM-dd_hh:mm:ss" ); has a valid final first line. That is why in Groovy we suggested the logic that the code line should end with the operator to be more easily be able to tell the compiler that the next line belongs to the same statement. Such operators are for the compiler = and of course \ too. Which is why s2 and s3m are no problem. So terminate implicitly sounds right to me, even though I would not call that semantic recognition, since it is still only the parser that tells the compiler if a statement ends or not. We are for example not having both lines as possibly valid by their own in the parser and then combine them into a single statement somehow. Anyway... I think this kind of bug can be fixed in the antlr grammar
        Hide
        texadactyl Richard Elkins added a comment - - edited

        Jochen,

        How can "static final SimpleDateFormat s3" be valid if the variable has been declared final but has no value assigned? Without the token "final", I would agree with you.

        Newline (\n) part of the syntax of a language? Why would you want to make an invisible character (text line terminator) part of a language? Python did it with tab characters (\t) and drives everyone crazy.

        You might be thinking that I am picky or Johnny-come-lately with respect to Groovy. Both true. Just started compiling older Java to Groovy as part of self-education.

        Show
        texadactyl Richard Elkins added a comment - - edited Jochen, How can "static final SimpleDateFormat s3" be valid if the variable has been declared final but has no value assigned? Without the token "final", I would agree with you. Newline (\n) part of the syntax of a language? Why would you want to make an invisible character (text line terminator) part of a language? Python did it with tab characters (\t) and drives everyone crazy. You might be thinking that I am picky or Johnny-come-lately with respect to Groovy. Both true. Just started compiling older Java to Groovy as part of self-education.
        Hide
        paulk Paul King added a comment -

        If you compile your static final example with the latest snapshot versions of Groovy you will get:

        The variable [s3] may be uninitialized
        

        But in some sense that isn't what your main point is, so I won't dwell on that.

        Perhaps if you think of newline as a statement terminator, that will explain the current behavior. We do some but not very much lookahead. You are right that we could do smarter lookahead within the grammar but it isn't trivial.

        As an example, Groovy allows this statement:

        multiply 4 by 5
        

        which is the same as:

        multiply(4).by(5)
        

        With an implementation like:

        def multiply(multiplicand) { [by: { multiplier -> multiplicand * multiplier }] }
        

        Where as this:

        multiply 4
        by 5
        

        Are these two valid statements:

        multiply(4)
        by(5)
        
        Show
        paulk Paul King added a comment - If you compile your static final example with the latest snapshot versions of Groovy you will get: The variable [s3] may be uninitialized But in some sense that isn't what your main point is, so I won't dwell on that. Perhaps if you think of newline as a statement terminator, that will explain the current behavior. We do some but not very much lookahead. You are right that we could do smarter lookahead within the grammar but it isn't trivial. As an example, Groovy allows this statement: multiply 4 by 5 which is the same as: multiply(4).by(5) With an implementation like: def multiply(multiplicand) { [by: { multiplier -> multiplicand * multiplier }] } Where as this: multiply 4 by 5 Are these two valid statements: multiply(4) by(5)
        Hide
        blackdrag Jochen Theodorou added a comment - - edited

        "static final SimpleDateFormat s3" is legal if you have a static initializer block, that sets the variable. But you need to put some semantics into the evaluation to determine that, which is why lexer and parser see that as valid statement.

        Why would we make newline part of the language? Well because the semicolon is annoying Seriously, if you did Groovy for a while and then go to Java, the semicolon becomes extremely annoying.

        Show
        blackdrag Jochen Theodorou added a comment - - edited "static final SimpleDateFormat s3" is legal if you have a static initializer block, that sets the variable. But you need to put some semantics into the evaluation to determine that, which is why lexer and parser see that as valid statement. Why would we make newline part of the language? Well because the semicolon is annoying Seriously, if you did Groovy for a while and then go to Java, the semicolon becomes extremely annoying.
        Hide
        texadactyl Richard Elkins added a comment -

        Jochen,

        "static final SimpleDateFormat s3" is legal if you have a static initializer block, that sets the variable.

        But, I did not. See the attachment. That is why I wondered why it wasn't flagged.

        One person's annoyance ( is another person's point of clarity.

        Show
        texadactyl Richard Elkins added a comment - Jochen, "static final SimpleDateFormat s3" is legal if you have a static initializer block, that sets the variable. But, I did not. See the attachment. That is why I wondered why it wasn't flagged. One person's annoyance ( is another person's point of clarity.
        Hide
        texadactyl Richard Elkins added a comment -

        Paul & Jochen,

        I am not trying to drag out what you two may feel is being argumentative. Given that this is currently a feature, perhaps the Groovy documentation could highlight this and include some examples similar to mine.

        Richard

        Show
        texadactyl Richard Elkins added a comment - Paul & Jochen, I am not trying to drag out what you two may feel is being argumentative. Given that this is currently a feature, perhaps the Groovy documentation could highlight this and include some examples similar to mine. Richard
        Hide
        paulk Paul King added a comment -

        Yes, improving the documentation is always a good thing. I found a brief reference in Groovy in Action but couldn't see any equivalent in the online documentation. PRs welcome!

        Show
        paulk Paul King added a comment - Yes, improving the documentation is always a good thing. I found a brief reference in Groovy in Action but couldn't see any equivalent in the online documentation. PRs welcome!
        Hide
        daniel_sun Daniel Sun added a comment -

        Anyway... I think this kind of bug can be fixed in the antlr grammar

        Jochen, I think the new parser Parrot can be refined to support some code like

        def a
        = 1 + 2
        

        but I am not going to make Parrot support

        def a = 1
        + 2
        

        In other words, the first token in second line should be invalid for the statement.

        Show
        daniel_sun Daniel Sun added a comment - Anyway... I think this kind of bug can be fixed in the antlr grammar Jochen, I think the new parser Parrot can be refined to support some code like def a = 1 + 2 but I am not going to make Parrot support def a = 1 + 2 In other words, the first token in second line should be invalid for the statement.
        Hide
        daniel_sun Daniel Sun added a comment -
        Show
        daniel_sun Daniel Sun added a comment - I've tried to fix it in the groovy-parser project: https://github.com/danielsun1106/groovy-parser/commit/68d791de2b88693febdfe15896ed8aea2dee9ed1
        Hide
        blackdrag Jochen Theodorou added a comment -

        nice

        Show
        blackdrag Jochen Theodorou added a comment - nice
        Hide
        daniel_sun Daniel Sun added a comment -

        Fixed in the parrot branch.

        Show
        daniel_sun Daniel Sun added a comment - Fixed in the parrot branch.
        Hide
        daniel_sun Daniel Sun added a comment -

        Besides =, I have refined the Parrot parser to support more expressions span rows:
        https://github.com/apache/groovy/commit/929bf81114a2f0ea4fa231e283ac3e3b4b2bc5d4

        ps: you can find that the first token in the new line is invalid for the statement.

        Show
        daniel_sun Daniel Sun added a comment - Besides = , I have refined the Parrot parser to support more expressions span rows: https://github.com/apache/groovy/commit/929bf81114a2f0ea4fa231e283ac3e3b4b2bc5d4 ps: you can find that the first token in the new line is invalid for the statement.

          People

          • Assignee:
            daniel_sun Daniel Sun
            Reporter:
            texadactyl Richard Elkins
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development