I was thinking we could make the syntax part of FOREACH.
B = FOREACH A GENERATE a, b, c ASSERT a >= 0, b IS NOT NULL;
That way it is easy to integrate asserts in the flow.
The advantage of having it part of the language:
- the error message can be clear without extra user input.
- it's more natural than doing a filter that does not filter. Also if the filter is not in the predecessors of a STORE, it won't be executed.
A UDF can stop the job by throwing an exception. Although the task will retry before failing completely.
For reference, the UDF based syntax:
FILTER members BY ASSERT( (member_id >= 0 ? 1 : 0), 'Doh! Some member ID is negative.' );
Yes adding new keywords is inconvenient when the keyword was used for relation or column names.
When a field collides with a keyword it is sometimes difficult to rename it.
I think we should:
- try to avoid new keywords if possible
- provide a mechanism to escape field names to facilitate fixing conflicts when they happen (using quotes or a similar mechanism)