Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.11
    • Fix Version/s: 0.12.0
    • Component/s: internal-udfs, parser
    • Labels:
      None
    • Release Note:
      Hide
      Pig now supports IN operator, and it can be used in any conditional expressions. For example,
      bar = FILTER foo BY i IN ('a', 'b', 'c');
      Show
      Pig now supports IN operator, and it can be used in any conditional expressions. For example, bar = FILTER foo BY i IN ('a', 'b', 'c');

      Description

      This is another language improvement using the same approach as in PIG-3268.

      Currently, Pig has no support for IN operator. To mimic it, users often have to concatenate several OR operators.

      For example,

      a = LOAD '1.txt' USING PigStorage(',') AS (i:int);
      b = FILTER a BY 
         (i == 1) OR
         (i == 22) OR
         (i == 333) OR
         (i == 4444) OR
         (i == 55555);
      

      But this can be re-rewritten in a more compact manner using IN operator as follows:

      a = LOAD '1.txt' USING PigStorage(',') AS (i:int);
      b = FILTER a BY i IN (1,22,333,4444,55555);
      

      I propose that we implement IN operator in the following manner:

      • Add built-in UDFs that take expressions as args. Take for example the aforementioned IN operator, we can define a UDF such as builtInUdf(i, 1, 22, 333, 4444, 55555).
      • Add syntactical sugar for these built-in UDFs.
      1. PIG-3269.patch
        7 kB
        Cheolsoo Park
      2. PIG-3269-2.patch
        6 kB
        Cheolsoo Park
      3. PIG-3269-3.patch
        17 kB
        Cheolsoo Park
      4. PIG-3269-4.patch
        17 kB
        Cheolsoo Park
      5. PIG-3269-5.patch
        16 kB
        Cheolsoo Park

        Issue Links

          Activity

          Cheolsoo Park created issue -
          Cheolsoo Park made changes -
          Field Original Value New Value
          Description This is another language improvement using the same approach as in PIG-3268.

          Currently, Pig has no support for IN operator. To mimic it, users often have to concatenate several OR operators.

          For example,
          {code}
          a = LOAD '1.txt' USING PigStorage(',') AS (i:int);
          b = FILTER a BY
             (i == 1) OR
             (i == 22) OR
             (i == 333) OR
             (i == 4444) OR
             (i == 55555);
          {code}
          But this can be re-rewritten in a more compact manner using IN operator as follows:
          {code}
          a = LOAD '1.txt' USING PigStorage(',') AS (i:int);
          b = FILTER a BY i IN (1,22,333,4444,55555);
          {code}
          I propose that we implement IN operator in the following manner:
          * Add built-in UDFs that take expressions as args. Take for example the aforementioned case statement, we can define a UDF such as {{builtInUdf(i, 1, 22, 333, 4444, 55555)}}.
          * Add syntactical sugar for these built-in UDFs.

          Similarly to PIG-3268, this approach requires a limit on the number of values. This is again because we need to populate the full list of possible args schemas in {{EvalFunc.getArgToFuncMapping}}. For now, I arbitrarily chose 50, but it can be easily changed.
          This is another language improvement using the same approach as in PIG-3268.

          Currently, Pig has no support for IN operator. To mimic it, users often have to concatenate several OR operators.

          For example,
          {code}
          a = LOAD '1.txt' USING PigStorage(',') AS (i:int);
          b = FILTER a BY
             (i == 1) OR
             (i == 22) OR
             (i == 333) OR
             (i == 4444) OR
             (i == 55555);
          {code}
          But this can be re-rewritten in a more compact manner using IN operator as follows:
          {code}
          a = LOAD '1.txt' USING PigStorage(',') AS (i:int);
          b = FILTER a BY i IN (1,22,333,4444,55555);
          {code}
          I propose that we implement IN operator in the following manner:
          * Add built-in UDFs that take expressions as args. Take for example the aforementioned IN operator, we can define a UDF such as {{builtInUdf(i, 1, 22, 333, 4444, 55555)}}.
          * Add syntactical sugar for these built-in UDFs.

          Similarly to PIG-3268, this approach requires a limit on the number of values. This is again because we need to populate the full list of possible args schemas in {{EvalFunc.getArgToFuncMapping}}. For now, I arbitrarily chose 50, but it can be easily changed.
          Cheolsoo Park made changes -
          Attachment PIG-3269.patch [ 12577470 ]
          Cheolsoo Park made changes -
          Attachment PIG-3269-2.patch [ 12577514 ]
          Cheolsoo Park made changes -
          Description This is another language improvement using the same approach as in PIG-3268.

          Currently, Pig has no support for IN operator. To mimic it, users often have to concatenate several OR operators.

          For example,
          {code}
          a = LOAD '1.txt' USING PigStorage(',') AS (i:int);
          b = FILTER a BY
             (i == 1) OR
             (i == 22) OR
             (i == 333) OR
             (i == 4444) OR
             (i == 55555);
          {code}
          But this can be re-rewritten in a more compact manner using IN operator as follows:
          {code}
          a = LOAD '1.txt' USING PigStorage(',') AS (i:int);
          b = FILTER a BY i IN (1,22,333,4444,55555);
          {code}
          I propose that we implement IN operator in the following manner:
          * Add built-in UDFs that take expressions as args. Take for example the aforementioned IN operator, we can define a UDF such as {{builtInUdf(i, 1, 22, 333, 4444, 55555)}}.
          * Add syntactical sugar for these built-in UDFs.

          Similarly to PIG-3268, this approach requires a limit on the number of values. This is again because we need to populate the full list of possible args schemas in {{EvalFunc.getArgToFuncMapping}}. For now, I arbitrarily chose 50, but it can be easily changed.
          This is another language improvement using the same approach as in PIG-3268.

          Currently, Pig has no support for IN operator. To mimic it, users often have to concatenate several OR operators.

          For example,
          {code}
          a = LOAD '1.txt' USING PigStorage(',') AS (i:int);
          b = FILTER a BY
             (i == 1) OR
             (i == 22) OR
             (i == 333) OR
             (i == 4444) OR
             (i == 55555);
          {code}
          But this can be re-rewritten in a more compact manner using IN operator as follows:
          {code}
          a = LOAD '1.txt' USING PigStorage(',') AS (i:int);
          b = FILTER a BY i IN (1,22,333,4444,55555);
          {code}
          I propose that we implement IN operator in the following manner:
          * Add built-in UDFs that take expressions as args. Take for example the aforementioned IN operator, we can define a UDF such as {{builtInUdf(i, 1, 22, 333, 4444, 55555)}}.
          * Add syntactical sugar for these built-in UDFs.
          Cheolsoo Park made changes -
          Attachment PIG-3269-3.patch [ 12577536 ]
          Cheolsoo Park made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Cheolsoo Park made changes -
          Attachment PIG-3269-4.patch [ 12577895 ]
          Cheolsoo Park made changes -
          Attachment PIG-3269-5.patch [ 12579151 ]
          Cheolsoo Park made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Cheolsoo Park made changes -
          Release Note Pig now supports IN operator, and it can be used in any conditional expressions. For example,
          {code}
          bar = FILTER foo BY i IN ('a', 'b', 'c');
          {code}
          Cheolsoo Park made changes -
          Release Note Pig now supports IN operator, and it can be used in any conditional expressions. For example,
          {code}
          bar = FILTER foo BY i IN ('a', 'b', 'c');
          {code}
          Pig now supports IN operator, and it can be used in any conditional expressions. For example,
          bar = FILTER foo BY i IN ('a', 'b', 'c');
          Cheolsoo Park made changes -
          Link This issue is related to PIG-3280 [ PIG-3280 ]
          Cheolsoo Park made changes -
          Link This issue relates to PIG-3336 [ PIG-3336 ]
          Daniel Dai made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              Cheolsoo Park
              Reporter:
              Cheolsoo Park
            • Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development