Pig
  1. Pig
  2. PIG-2205

Improve error checking around Scalar functionality

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      The scalar-loading feature added in 0.8 has caused a new class of user problems to show up, when they use the "foo.bar" syntax incorrectly.

        Activity

        Hide
        Raghu Angadi added a comment -

        PIG-1967 is the long term solution (unless we are going the bite the bullet and make casting mandatory).

        Until then, this would be great to have! I am hoping this would result in hard error in most cases.

        Show
        Raghu Angadi added a comment - PIG-1967 is the long term solution (unless we are going the bite the bullet and make casting mandatory). Until then, this would be great to have! I am hoping this would result in hard error in most cases.
        Hide
        Dmitriy V. Ryaboy added a comment -

        Thejas – yeah I guess it's a duplicate, though perhaps we should keep 2 separate issues since we are suggesting 2 different fixes that complement each other.

        PIG-1967 is for requiring casts and deprecating the no-cast.

        This ticket is for also looking at the meaning of what you are doing, and displaying a helpful error message if you are referring to your own relation or column, rather than printing a deprecated message and trying anyway.

        Show
        Dmitriy V. Ryaboy added a comment - Thejas – yeah I guess it's a duplicate, though perhaps we should keep 2 separate issues since we are suggesting 2 different fixes that complement each other. PIG-1967 is for requiring casts and deprecating the no-cast. This ticket is for also looking at the meaning of what you are doing, and displaying a helpful error message if you are referring to your own relation or column, rather than printing a deprecated message and trying anyway.
        Hide
        Dmitriy V. Ryaboy added a comment -

        It used to be that something like the script below was thrown out by the parser:

        filtered = filter my_relation by my_relation.x > 2;
        

        We now try to actually evaluate that, treating my_relation as a scalar to be loaded up while iterating over my_relation.
        We should instead suggest that the user probably wanted to write

        filtered = filter my_relation by x > 2;
        

        A similar problems occurs in this code:

        joined = join a by id, b by id;
        projected = foreach joined generate a.id;
        

        Naturally, the user actually meant

        projected = foreach joined generate a::id;
        

        Instead of erroring out, we currently generate massive plans that involve lots of splits (I saw a 5-line script filled with this syntax mistake generate 12 jobs!), and fail eventually with "Scalar has more than one row in the output" – which doesn't help a user who is not advanced enough to know about Scalars.

        This is extra confusing to people coming from a SQL background, who are of course extremely used to referring to their tables' fields this way.

        Show
        Dmitriy V. Ryaboy added a comment - It used to be that something like the script below was thrown out by the parser: filtered = filter my_relation by my_relation.x > 2; We now try to actually evaluate that, treating my_relation as a scalar to be loaded up while iterating over my_relation. We should instead suggest that the user probably wanted to write filtered = filter my_relation by x > 2; A similar problems occurs in this code: joined = join a by id, b by id; projected = foreach joined generate a.id; Naturally, the user actually meant projected = foreach joined generate a::id; Instead of erroring out, we currently generate massive plans that involve lots of splits (I saw a 5-line script filled with this syntax mistake generate 12 jobs!), and fail eventually with "Scalar has more than one row in the output" – which doesn't help a user who is not advanced enough to know about Scalars. This is extra confusing to people coming from a SQL background, who are of course extremely used to referring to their tables' fields this way.
        Hide
        Thejas M Nair added a comment -

        Would the proposal in PIG-1967 help ? (is this a duplicate?)

        Show
        Thejas M Nair added a comment - Would the proposal in PIG-1967 help ? (is this a duplicate?)

          People

          • Assignee:
            Unassigned
            Reporter:
            Dmitriy V. Ryaboy
          • Votes:
            1 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:

              Development