Pig
  1. Pig
  2. PIG-2290

TOBAG wraps tuple parameters in another tuple

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.9.0
    • Fix Version/s: 0.10.0
    • Component/s: internal-udfs
    • Labels:
      None
    • Patch Info:
      Patch Available
    • Release Note:
      If the argument to TOBAG is a tuple, it does not get inserted into another tuple before being added to the bag.

      Description

      The TOBAG function indiscriminately wraps all parameters in a tuple. When I pass a list of tuples to the function, I would expect it to return a bag containing those tuples. Instead, it returns a bag containing single element tuples, where each tuple contains one of the tuples passed in.

      Example:

      tuples.txt
      (mike,608)
      (ryan,11624)
      (justin,2317)
      
      Demonstration using pig 0.9.0
      grunt> TUPLE_DATA = LOAD 'tuples.txt' AS (T:tuple(name:chararray,street_number:int));
      grunt> BAGGED = FOREACH TUPLE_DATA GENERATE TOBAG(T);
      grunt> DESCRIBE BAGGED;
      BAGGED: {{(name: chararray,street_number: int)}}
      grunt> DUMP BAGGED;
      ({((mike,608))})
      ({((ryan,11624))})
      ({((justin,2317))})
      
      1. pig-2290.patch
        3 kB
        Ryan Hoegg
      2. pig-2290.1.patch
        4 kB
        Thejas M Nair

        Activity

        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Open Open Patch Available Patch Available
        6m 23s 1 Ryan Hoegg 16/Sep/11 15:49
        Patch Available Patch Available Resolved Resolved
        26d 3h 17m 1 Thejas M Nair 12/Oct/11 19:06
        Resolved Resolved Closed Closed
        197d 1h 26m 1 Daniel Dai 26/Apr/12 20:33
        Daniel Dai made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Thejas M Nair made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Release Note If the argument to TOBAG is a tuple, it does not get inserted into another tuple before being added to the bag.
        Resolution Fixed [ 1 ]
        Hide
        Thejas M Nair added a comment -

        +1 . Patch committed to trunk.
        Thanks for the contribution Ryan!

        Show
        Thejas M Nair added a comment - +1 . Patch committed to trunk. Thanks for the contribution Ryan!
        Thejas M Nair made changes -
        Fix Version/s 0.10 [ 12316246 ]
        Thejas M Nair made changes -
        Assignee Dmitriy V. Ryaboy [ dvryaboy ] Ryan Hoegg [ ryan.hoegg ]
        Thejas M Nair made changes -
        Attachment pig-2290.1.patch [ 12498786 ]
        Hide
        Thejas M Nair added a comment -

        pig-2290.1.patch - fix patch format issue, updated javadoc.

        Show
        Thejas M Nair added a comment - pig-2290.1.patch - fix patch format issue, updated javadoc.
        Hide
        Thejas M Nair added a comment -

        I didn't notice that the output does not match schema. That means one can't use the output of the TOBAG later in the query. So fixing this is not likely to have an impact on most of the existing queries. It will affect only some cases where the output of TOBAG is stored without further processing on it into a storage that does not store schema.
        But this use case is something that users would need to check for, when the upgrade. I would vote for making this change only in 0.10, and not for 0.9.1 .

        Show
        Thejas M Nair added a comment - I didn't notice that the output does not match schema. That means one can't use the output of the TOBAG later in the query. So fixing this is not likely to have an impact on most of the existing queries. It will affect only some cases where the output of TOBAG is stored without further processing on it into a storage that does not store schema. But this use case is something that users would need to check for, when the upgrade. I would vote for making this change only in 0.10, and not for 0.9.1 .
        Hide
        Ryan Hoegg added a comment -

        This is probably true, but the current implementation behaves inconsistently (output does not match schema). Plus, it makes it impossible to generate a bag containing tuples with more than one element.

        Is 0.9 -> 0.10 considered a "major version upgrade"?

        Show
        Ryan Hoegg added a comment - This is probably true, but the current implementation behaves inconsistently (output does not match schema). Plus, it makes it impossible to generate a bag containing tuples with more than one element. Is 0.9 -> 0.10 considered a "major version upgrade"?
        Hide
        Thejas M Nair added a comment -

        This change is not backward compatible. It can break existing pig queries.
        It can be argued that the TOBAG current implementation has a correct/consistent behavior - it puts each argument into a tuple and adds it to a bag. The bag always contains tuple with single element.

        I think there has to be a very strong reason to break backward compatibility. This could go into a new UDF (say TOBAG_2 ?). Or this could go into some major version upgrade of pig where we make bunch of non backward compatible changes.

        Show
        Thejas M Nair added a comment - This change is not backward compatible. It can break existing pig queries. It can be argued that the TOBAG current implementation has a correct/consistent behavior - it puts each argument into a tuple and adds it to a bag. The bag always contains tuple with single element. I think there has to be a very strong reason to break backward compatibility. This could go into a new UDF (say TOBAG_2 ?). Or this could go into some major version upgrade of pig where we make bunch of non backward compatible changes.
        Ryan Hoegg made changes -
        Assignee Ryan Hoegg [ ryan.hoegg ] Dmitriy V. Ryaboy [ dvryaboy ]
        Ryan Hoegg made changes -
        Attachment pig-2290.patch [ 12494808 ]
        Ryan Hoegg made changes -
        Attachment pig-2290.patch [ 12494850 ]
        Hide
        Ryan Hoegg added a comment -

        Fixed formatting in TOBAG.java

        Show
        Ryan Hoegg added a comment - Fixed formatting in TOBAG.java
        Dmitriy V. Ryaboy made changes -
        Assignee Ryan Hoegg [ ryan.hoegg ]
        Hide
        Dmitriy V. Ryaboy added a comment -

        Looks good, though the formatting is a little off:

        +                } else {
                         Tuple tp2 = TupleFactory.getInstance().newTuple(1);
                         tp2.set(0, object);
                         bag.add(tp2);
                     }
        +            }
        

        The else block is not indented properly.

        Ok to put in trunk or do you feel this must go into 0.9.1?

        Show
        Dmitriy V. Ryaboy added a comment - Looks good, though the formatting is a little off: + } else { Tuple tp2 = TupleFactory.getInstance().newTuple(1); tp2.set(0, object); bag.add(tp2); } + } The else block is not indented properly. Ok to put in trunk or do you feel this must go into 0.9.1?
        Ryan Hoegg made changes -
        Patch Info [Patch Available]
        Ryan Hoegg made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Ryan Hoegg made changes -
        Field Original Value New Value
        Attachment pig-2290.patch [ 12494808 ]
        Hide
        Ryan Hoegg added a comment -

        Patch to TOBAG function to check if the parameter is a tuple before creating one, including unit test

        Show
        Ryan Hoegg added a comment - Patch to TOBAG function to check if the parameter is a tuple before creating one, including unit test
        Ryan Hoegg created issue -

          People

          • Assignee:
            Ryan Hoegg
            Reporter:
            Ryan Hoegg
          • Votes:
            2 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development