Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.12.1, 0.13.0
    • Component/s: impl
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      PIG-3367 introduce "assert" keyword. However, it can only be used in Java embedding, when use registerQuery. In Pig script or Grunt shell, GruntParser will complain.

      1. PIG-3670-1.patch
        2 kB
        Daniel Dai
      2. PIG-3670-2_addition.patch
        6 kB
        Lorand Bendig
      3. PIG-3670-2.patch
        3 kB
        Cheolsoo Park

        Activity

        Hide
        Cheolsoo Park added a comment -

        +1. Thank you for fixing this.

        Your patch didn't apply cleanly to trunk, so I rebased it. In addition, I replaced the temporary file in the test with ByteArrayInputStream. Since PigServer.registerScript() accepts InputStream, it seems better to avoid to create a temporary file. Let me know if you think otherwise.

        Show
        Cheolsoo Park added a comment - +1. Thank you for fixing this. Your patch didn't apply cleanly to trunk, so I rebased it. In addition, I replaced the temporary file in the test with ByteArrayInputStream. Since PigServer.registerScript() accepts InputStream, it seems better to avoid to create a temporary file. Let me know if you think otherwise.
        Hide
        Daniel Dai added a comment -

        Patch committed to trunk. I would also like to commit to 0.12 branch since we claim assert work in 0.12.

        Thanks Cheolsoo for review and improve!

        Show
        Daniel Dai added a comment - Patch committed to trunk. I would also like to commit to 0.12 branch since we claim assert work in 0.12. Thanks Cheolsoo for review and improve!
        Hide
        Lorand Bendig added a comment -

        Thanks for fixing it! Now ASSERT works fine, however, grammar allows to assign alias to it which may lead to the following issue:

        A = load 'data.txt' AS (a0:int,a1:int,a2:int);
        B = ASSERT A by a0 >= 5, 'a0 should be >= 5'; 
        
        describe B;                                                                                      
        14/02/09 10:21:04 ERROR grunt.Grunt: ERROR 1003: Unable to find an operator for alias B
        
        C = foreach B generate $0;
        14/02/09 10:22:08 ERROR grunt.Grunt: ERROR 1200: Pig script failed to parse: 
        <line 3, column 4> pig script failed to validate: Unrecognized alias B
        

        I'd address this issue by changing the grammar so that assigning alias will not be possible (similar to SPLIT).

        Show
        Lorand Bendig added a comment - Thanks for fixing it! Now ASSERT works fine, however, grammar allows to assign alias to it which may lead to the following issue: A = load 'data.txt' AS (a0: int ,a1: int ,a2: int ); B = ASSERT A by a0 >= 5, 'a0 should be >= 5'; describe B; 14/02/09 10:21:04 ERROR grunt.Grunt: ERROR 1003: Unable to find an operator for alias B C = foreach B generate $0; 14/02/09 10:22:08 ERROR grunt.Grunt: ERROR 1200: Pig script failed to parse: <line 3, column 4> pig script failed to validate: Unrecognized alias B I'd address this issue by changing the grammar so that assigning alias will not be possible (similar to SPLIT).
        Hide
        Daniel Dai added a comment -

        Thank Lorand, that is right, there is no alias for assert in document. I committed your patch to trunk and branch 0.12.

        Show
        Daniel Dai added a comment - Thank Lorand, that is right, there is no alias for assert in document. I committed your patch to trunk and branch 0.12.
        Hide
        Aniket Mokashi added a comment -

        JFYI - b = store a into 'crap2'; - is also a correct syntax in pig trunk. Daniel Dai does this deserve a jira?

        Show
        Aniket Mokashi added a comment - JFYI - b = store a into 'crap2'; - is also a correct syntax in pig trunk. Daniel Dai does this deserve a jira?
        Hide
        Daniel Dai added a comment - - edited

        Ok, it is better to have an aliased version so that user can weave assert statement into pipeline. Can you open a new ticket for it?

        Show
        Daniel Dai added a comment - - edited Ok, it is better to have an aliased version so that user can weave assert statement into pipeline. Can you open a new ticket for it?
        Hide
        Aniket Mokashi added a comment -

        Ok, it is better to have an aliased version so that user can weave assert statement into pipeline.

        Do you mean we should support B = ASSERT A by a0 >= 5, 'a0 should be >= 5'; ?

        Show
        Aniket Mokashi added a comment - Ok, it is better to have an aliased version so that user can weave assert statement into pipeline. Do you mean we should support B = ASSERT A by a0 >= 5, 'a0 should be >= 5'; ?
        Hide
        Daniel Dai added a comment -

        I think we should allow the following script:

        A = load 'data.txt' AS (a0:int,a1:int,a2:int);
        B = ASSERT A by a0 >= 5, 'a0 should be >= 5'; 
        C = foreach B generate xxxx;
        

        How do you think?

        Show
        Daniel Dai added a comment - I think we should allow the following script: A = load 'data.txt' AS (a0: int ,a1: int ,a2: int ); B = ASSERT A by a0 >= 5, 'a0 should be >= 5'; C = foreach B generate xxxx; How do you think?
        Hide
        Lorand Bendig added a comment -

        I think Aniket Mokashi is right, it shouldn't be allowed to be able to assign an alias to a STORE, similarly to SPLIT. However, I also think that an aliased ASSERT makes the interpretation of the pipeline clearer. Then the assigned alias, which initially didn't have any meaning, could be interpreted in case of a dump/store as 'foreach A generate * which fails if a0 > 5' .

        Show
        Lorand Bendig added a comment - I think Aniket Mokashi is right, it shouldn't be allowed to be able to assign an alias to a STORE, similarly to SPLIT. However, I also think that an aliased ASSERT makes the interpretation of the pipeline clearer. Then the assigned alias, which initially didn't have any meaning, could be interpreted in case of a dump/store as 'foreach A generate * which fails if a0 > 5' .
        Hide
        Russell Jurney added a comment -

        Can someone please chime in and tell us what the syntax for ASSERT ended up being?

        Show
        Russell Jurney added a comment - Can someone please chime in and tell us what the syntax for ASSERT ended up being?
        Hide
        Russell Jurney added a comment -

        The syntax ended up being:

        ASSERT my_relation BY field1 IS NOT NULL, 'Oh no, field1 is null';

        Show
        Russell Jurney added a comment - The syntax ended up being: ASSERT my_relation BY field1 IS NOT NULL, 'Oh no, field1 is null';

          People

          • Assignee:
            Lorand Bendig
            Reporter:
            Daniel Dai
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development