Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.11
    • Fix Version/s: 0.12.0
    • Component/s: internal-udfs, parser
    • Labels:
      None
    • Release Note:
      Hide
      Pig now supports CASE expression. It can be used in a place of any expression. For example,
      bar = FOREACH foo GENERATE (
        CASE i % 3
           WHEN 0 THEN '3n'
           WHEN 1 THEN '3n+1'
           ELSE '3n+2'
        END
      );

      Note that CASE is now a reserved keyword, and thus, it can no longer be used as a name of column or field.
      Show
      Pig now supports CASE expression. It can be used in a place of any expression. For example, bar = FOREACH foo GENERATE (   CASE i % 3      WHEN 0 THEN '3n'      WHEN 1 THEN '3n+1'      ELSE '3n+2'   END ); Note that CASE is now a reserved keyword, and thus, it can no longer be used as a name of column or field.

      Description

      Currently, Pig has no support for case statement. To mimic it, users often use nested bincond operators. However, that easily becomes unreadable when there are multiple levels of nesting.

      For example,

      a = LOAD '1.txt' USING PigStorage(',') AS (i:int);
      b = FOREACH a GENERATE (
          i % 3 == 0 ? '3n' : (i % 3 == 1 ? '3n + 1' : '3n + 2')
      );
      

      This can be re-written much more nicely using case statement as follows:

      a = LOAD '1.txt' USING PigStorage(',') AS (i:int);
      b = FOREACH a GENERATE (
          CASE i % 3
              WHEN 0 THEN '3n'
              WHEN 1 THEN '3n + 1'
              ELSE        '3n + 2'
          END
      );
      

      I propose that we implement case statement in the following manner:

      • Add built-in UDFs that take expressions as args. Take for example the aforementioned case statement, we can define a UDF such as builtInUdf(i % 3, 0, '3n', 1, '3n + 1', '3n + 2').
      • Add syntactical sugar for these built-in UDFs.

      In fact, I burrowed this idea from HIVE-164.

      One downside of this approach is that all the possible args schemas of these UDFs must be pre-computed. Specifically, we need to populate the full list of possible args schemas in EvalFunc.getArgToFuncMapping.

      In particular, since we obviously cannot support infinitely long args, it is necessary to impose a limit on the size of when branches. For now, I arbitrarily chose 50, but it can be easily changed.

      1. PIG-3268.patch
        25 kB
        Cheolsoo Park
      2. PIG-3268-2.patch
        47 kB
        Cheolsoo Park
      3. PIG-3268-3.patch
        15 kB
        Cheolsoo Park
      4. PIG-3268-4.patch
        15 kB
        Cheolsoo Park
      5. PIG-3268-5.patch
        17 kB
        Cheolsoo Park
      6. PIG-3268-6.patch
        17 kB
        Cheolsoo Park
      7. PIG-3268-7.patch
        17 kB
        Cheolsoo Park

        Issue Links

          Activity

          Cheolsoo Park created issue -
          Cheolsoo Park made changes -
          Field Original Value New Value
          Attachment PIG-3268.patch [ 12577424 ]
          Cheolsoo Park made changes -
          Description Currently, Pig has no support for case statement. To mimic it, users often use nested bincond operators. However, that easily becomes unreadable when there are multiple levels of nesting.

          For example,
          {code}
          a = LOAD '1.txt' USING PigStorage(',') AS (i:int);
          b = FOREACH a GENERATE (
              i % 3 == 0 ? '3n' : (i % 3 == 1 ? '3n + 1' : '3n + 2')
          );
          {code}
          This can be re-written much more nicely using case statement as follows:
          {code}
          a = LOAD '1.txt' USING PigStorage(',') AS (i:int);
          b = FOREACH a GENERATE (
              CASE i % 3
                  WHEN 0 THEN '3n'
                  WHEN 1 THEN '3n + 1'
                  ELSE '3n + 2'
              END
          );
          {code}
          I propose that we implement case statement in the following manner:
          * Add built-in UDFs that take expressions as args. Take for example the aforementioned case statement, we can define a UDF such as {{builtInUdf(i % 3, 0, '3n', 1, '3n + 1', '3n + 2')}}.
          * Add syntactical sugar for these built-in UDFs.

          In fact, I burrowed this idea from HIVE-164.

          One downside of this approach is that all the possible args schemas of these UDFs must be pre-computed. Specifically, we need to populate the full list of possible args schemas in {{EvalFunc.getArgToFuncMapping}}.

          In particular, since we obviously cannot support infinitely long args, it is necessary to impose a limit on level of nesting. For now, I arbitrarily chose 50, but it can be easily changed.
          Currently, Pig has no support for case statement. To mimic it, users often use nested bincond operators. However, that easily becomes unreadable when there are multiple levels of nesting.

          For example,
          {code}
          a = LOAD '1.txt' USING PigStorage(',') AS (i:int);
          b = FOREACH a GENERATE (
              i % 3 == 0 ? '3n' : (i % 3 == 1 ? '3n + 1' : '3n + 2')
          );
          {code}
          This can be re-written much more nicely using case statement as follows:
          {code}
          a = LOAD '1.txt' USING PigStorage(',') AS (i:int);
          b = FOREACH a GENERATE (
              CASE i % 3
                  WHEN 0 THEN '3n'
                  WHEN 1 THEN '3n + 1'
                  ELSE '3n + 2'
              END
          );
          {code}
          I propose that we implement case statement in the following manner:
          * Add built-in UDFs that take expressions as args. Take for example the aforementioned case statement, we can define a UDF such as {{builtInUdf(i % 3, 0, '3n', 1, '3n + 1', '3n + 2')}}.
          * Add syntactical sugar for these built-in UDFs.

          In fact, I burrowed this idea from HIVE-164.

          One downside of this approach is that all the possible args schemas of these UDFs must be pre-computed. Specifically, we need to populate the full list of possible args schemas in {{EvalFunc.getArgToFuncMapping}}.

          In particular, since we obviously cannot support infinitely long args, it is necessary to impose a limit on the size of when branches. For now, I arbitrarily chose 50, but it can be easily changed.
          Cheolsoo Park made changes -
          Attachment PIG-3268-2.patch [ 12577572 ]
          Cheolsoo Park made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Cheolsoo Park made changes -
          Attachment PIG-3268-3.patch [ 12577943 ]
          Cheolsoo Park made changes -
          Attachment PIG-3268-4.patch [ 12577946 ]
          Cheolsoo Park made changes -
          Status Patch Available [ 10002 ] Open [ 1 ]
          Cheolsoo Park made changes -
          Attachment PIG-3268-5.patch [ 12578324 ]
          Cheolsoo Park made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Cheolsoo Park made changes -
          Attachment PIG-3268-6.patch [ 12578334 ]
          Cheolsoo Park made changes -
          Attachment PIG-3268-7.patch [ 12579357 ]
          Cheolsoo Park made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Cheolsoo Park made changes -
          Release Note Pig now supports CASE expression. It can be used in a place of any expression. For example,
          bar = FOREACH foo GENERATE (
            CASE i % 3
               WHEN 0 THEN '3n'
               WHEN 0 THEN '3n+1'
               ELSE '3n+2'
            END
          );

          Note that CASE is now a reserved keyword, and thus, it can no longer be used as a name of column or field.
          Cheolsoo Park made changes -
          Release Note Pig now supports CASE expression. It can be used in a place of any expression. For example,
          bar = FOREACH foo GENERATE (
            CASE i % 3
               WHEN 0 THEN '3n'
               WHEN 0 THEN '3n+1'
               ELSE '3n+2'
            END
          );

          Note that CASE is now a reserved keyword, and thus, it can no longer be used as a name of column or field.
          Pig now supports CASE expression. It can be used in a place of any expression. For example,
          bar = FOREACH foo GENERATE (
            CASE i % 3
               WHEN 0 THEN '3n'
               WHEN 1 THEN '3n+1'
               ELSE '3n+2'
            END
          );

          Note that CASE is now a reserved keyword, and thus, it can no longer be used as a name of column or field.
          Cheolsoo Park made changes -
          Link This issue is related to PIG-3280 [ PIG-3280 ]
          Cheolsoo Park made changes -
          Link This issue relates to PIG-3342 [ PIG-3342 ]
          Daniel Dai made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              Cheolsoo Park
              Reporter:
              Cheolsoo Park
            • Votes:
              2 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development