Pig
  1. Pig
  2. PIG-1618 Switch to new parser generator technology
  3. PIG-1765

as_clause in foreach statement should differentiate between simple type and type within tuple

    Details

    • Type: Sub-task Sub-task
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      In new parser changes, the following statements are treated as same -

      f = foreach l generate a as aa :int; – here the column is now called aa and has type int

      f = foreach l generate a as (aa :int); – this should mean that the column has type "tuple with column aa of type int"

      With old parser the 2nd statement results in syntax error, which is fine, because it requires name part .

      The parenthesis represent tuple in pig. We should deprecate support for load statement that takes schema without the parenthesis part , such as following example -
      l = load 'x' as a:int – It should be as (a :int) , it is treated as such but this is inconsistent syntax.

        Activity

        Hide
        Xuefu Zhang added a comment -

        Per discussion above, there is no more issue regarding this.

        Show
        Xuefu Zhang added a comment - Per discussion above, there is no more issue regarding this.
        Hide
        Xuefu Zhang added a comment -

        The issue about foreach generate a as aa:int vs (aa:int) was once there but quickly addressed. Right now, we have exactly the same behavior as the old parser.

        In addition, B = foreach A generate (A.x); doesn't parse in the new parser, but parses in the old one. I think it should be invalid because A is used both as a relation as well as a scalar. If A has only one row, B = foreach A generate x will give the same result.

        We can deprecate A = load 'foo' as x:int; in the new parser.

        Show
        Xuefu Zhang added a comment - The issue about foreach generate a as aa:int vs (aa:int) was once there but quickly addressed. Right now, we have exactly the same behavior as the old parser. In addition, B = foreach A generate (A.x); doesn't parse in the new parser, but parses in the old one. I think it should be invalid because A is used both as a relation as well as a scalar. If A has only one row, B = foreach A generate x will give the same result. We can deprecate A = load 'foo' as x:int; in the new parser.
        Hide
        Olga Natkovich added a comment -

        Xuefu, can we just follow Alan's suggestion and test to make sure we are not breaking compatibility on this one.

        Show
        Olga Natkovich added a comment - Xuefu, can we just follow Alan's suggestion and test to make sure we are not breaking compatibility on this one.
        Hide
        Alan Gates added a comment -

        We don't know what we want to do in the long term here. Currently we do not have consistent semantics with parenthesis in Pig Latin. In the long term I think I would vote for () always meaning a tuple. But we need more discussion and thought before we settle on that. We are ready to push this into 0.9.

        For 0.9 it will be adequate to make sure we don't change behavior from 0.8. That is:

        B = foreach A generate (A.x);
        

        should still gives an error. We could also easily deprecate the

        A = load 'foo' as x:int;
        

        behavior.

        Show
        Alan Gates added a comment - We don't know what we want to do in the long term here. Currently we do not have consistent semantics with parenthesis in Pig Latin. In the long term I think I would vote for () always meaning a tuple. But we need more discussion and thought before we settle on that. We are ready to push this into 0.9. For 0.9 it will be adequate to make sure we don't change behavior from 0.8. That is: B = foreach A generate (A.x); should still gives an error. We could also easily deprecate the A = load 'foo' as x: int ; behavior.
        Hide
        Daniel Dai added a comment -

        If we are going to change, we also need to address the following situation as well:

        f = foreach l generate flatten(a) as (a0:int, a1:int); – here user do not mean to generate a tuple, but to name different component of flattened bag, we need to way to express this

        Show
        Daniel Dai added a comment - If we are going to change, we also need to address the following situation as well: f = foreach l generate flatten(a) as (a0:int, a1:int); – here user do not mean to generate a tuple, but to name different component of flattened bag, we need to way to express this
        Hide
        Alan Gates added a comment -

        This is going to be very non-intuitive to people. Based on 20+ years of mathematics most programmers expect x = to be true. We are saying in this case it isn't in this situation. I know this is not a mathematical expression, but many people will not see the difference. I agree consistent syntax between load as and foreach as is greatly desirable, so requiring the name or keyword tuple in the foreach as is not good. So maybe there is no way around this. But we should at least make sure the documentation is very specific, and maybe even issue warnings in the case, to make sure people know.

        Show
        Alan Gates added a comment - This is going to be very non-intuitive to people. Based on 20+ years of mathematics most programmers expect x = to be true. We are saying in this case it isn't in this situation. I know this is not a mathematical expression, but many people will not see the difference. I agree consistent syntax between load as and foreach as is greatly desirable, so requiring the name or keyword tuple in the foreach as is not good. So maybe there is no way around this. But we should at least make sure the documentation is very specific, and maybe even issue warnings in the case, to make sure people know.

          People

          • Assignee:
            Xuefu Zhang
            Reporter:
            Thejas M Nair
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development