Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.11
    • Fix Version/s: 0.12.0
    • Component/s: internal-udfs, parser
    • Labels:
      None
    • Release Note:
      Hide
      Pig now supports CASE expression. It can be used in a place of any expression. For example,
      bar = FOREACH foo GENERATE (
        CASE i % 3
           WHEN 0 THEN '3n'
           WHEN 1 THEN '3n+1'
           ELSE '3n+2'
        END
      );

      Note that CASE is now a reserved keyword, and thus, it can no longer be used as a name of column or field.
      Show
      Pig now supports CASE expression. It can be used in a place of any expression. For example, bar = FOREACH foo GENERATE (   CASE i % 3      WHEN 0 THEN '3n'      WHEN 1 THEN '3n+1'      ELSE '3n+2'   END ); Note that CASE is now a reserved keyword, and thus, it can no longer be used as a name of column or field.

      Description

      Currently, Pig has no support for case statement. To mimic it, users often use nested bincond operators. However, that easily becomes unreadable when there are multiple levels of nesting.

      For example,

      a = LOAD '1.txt' USING PigStorage(',') AS (i:int);
      b = FOREACH a GENERATE (
          i % 3 == 0 ? '3n' : (i % 3 == 1 ? '3n + 1' : '3n + 2')
      );
      

      This can be re-written much more nicely using case statement as follows:

      a = LOAD '1.txt' USING PigStorage(',') AS (i:int);
      b = FOREACH a GENERATE (
          CASE i % 3
              WHEN 0 THEN '3n'
              WHEN 1 THEN '3n + 1'
              ELSE        '3n + 2'
          END
      );
      

      I propose that we implement case statement in the following manner:

      • Add built-in UDFs that take expressions as args. Take for example the aforementioned case statement, we can define a UDF such as builtInUdf(i % 3, 0, '3n', 1, '3n + 1', '3n + 2').
      • Add syntactical sugar for these built-in UDFs.

      In fact, I burrowed this idea from HIVE-164.

      One downside of this approach is that all the possible args schemas of these UDFs must be pre-computed. Specifically, we need to populate the full list of possible args schemas in EvalFunc.getArgToFuncMapping.

      In particular, since we obviously cannot support infinitely long args, it is necessary to impose a limit on the size of when branches. For now, I arbitrarily chose 50, but it can be easily changed.

      1. PIG-3268.patch
        25 kB
        Cheolsoo Park
      2. PIG-3268-2.patch
        47 kB
        Cheolsoo Park
      3. PIG-3268-3.patch
        15 kB
        Cheolsoo Park
      4. PIG-3268-4.patch
        15 kB
        Cheolsoo Park
      5. PIG-3268-5.patch
        17 kB
        Cheolsoo Park
      6. PIG-3268-6.patch
        17 kB
        Cheolsoo Park
      7. PIG-3268-7.patch
        17 kB
        Cheolsoo Park

        Issue Links

          Activity

            People

            • Assignee:
              Cheolsoo Park
              Reporter:
              Cheolsoo Park
            • Votes:
              2 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development