Type: New Feature
Affects Version/s: 0.11
Fix Version/s: 0.12.0
Release Note:Pig now supports CASE expression. It can be used in a place of any expression. For example,
bar = FOREACH foo GENERATE (
CASE i % 3
WHEN 0 THEN '3n'
WHEN 1 THEN '3n+1'
Note that CASE is now a reserved keyword, and thus, it can no longer be used as a name of column or field.
Currently, Pig has no support for case statement. To mimic it, users often use nested bincond operators. However, that easily becomes unreadable when there are multiple levels of nesting.
This can be re-written much more nicely using case statement as follows:
I propose that we implement case statement in the following manner:
- Add built-in UDFs that take expressions as args. Take for example the aforementioned case statement, we can define a UDF such as builtInUdf(i % 3, 0, '3n', 1, '3n + 1', '3n + 2').
- Add syntactical sugar for these built-in UDFs.
In fact, I burrowed this idea from
One downside of this approach is that all the possible args schemas of these UDFs must be pre-computed. Specifically, we need to populate the full list of possible args schemas in EvalFunc.getArgToFuncMapping.
In particular, since we obviously cannot support infinitely long args, it is necessary to impose a limit on the size of when branches. For now, I arbitrarily chose 50, but it can be easily changed.