Project-range ( '..' ) can be used to project a range of columns from input.
For example, the expressions -
.. $x : projects columns $0 through $x, inclusive
$x .. : projects columns through end, inclusive
$x .. $y : projects columns through $y, inclusive
If the input relation has a schema, you can also use column aliases instead of referring to columns using position. You can also combine the use of alias and column positions in a project-range expression (ie, "col1 .. $5" is valid).
This expression can be used in all cases where the use of '*' (project-star) is allowed, except as a udf argument. Support for that use case will be added in
PIG-1938.
It can be used in following statements -
- foreach
- join
- order (also when it is within a nested foreach block)
- group/cogroup
Examples -
{code}
grunt> F = foreach IN generate (int)col0, col1 .. col3;
grunt> describe F;
F: {col0: int,col1: bytearray,col2: bytearray,col3: bytearray}
{code}
{code}
grunt> SORT = order IN by col2 .. col3, col0, col4 ..;
{code}
{code}
J = join IN1 by $0 .. $3, IN2 by $0 .. $3;
{code}
{code}
g = group l1 by b .. c;
{code}
Limitations:
There are some restrictions on the use of project-to-end form of project range (eg "x .. ") when input schema is null (unknown). These are also cases where the use of project-star ('*') is restricted.
1. In Cogroup/Group statements, project-to-end form of project-range is only allowed if the input has a schema
2. In order-by statement, project-to-end form of project-range is supported only as last sort column, if input schema is null.
example-
{code}
grunt> describe IN;
Schema for IN unknown.
-- Following statement is supported
SORT = order IN by $2 .. $3, $6 ..;
-- Following statement is NOT supported
SORT = order IN by $2 .. $3, $6 ..;
{code}