PIG-3367 the ASSERT keyword was created.
The current implementation allows for checking in each record in the bag if the value of a column is valid (and fail the job if it is not).
We did several experiments and found that an empty bag (0 tuples) always succeeds. We need to ensure that a bag has been loaded correctly.
- Allow the ASSERT statement to check if a bag is empty.
- Allow the ASSERT statement to check if a bag has more than (or less than) a specific number of tuples.
- For me this may be an approximating implementation. i.e. if I say it must have at least 5M tuples then it may still return 'is valid' if it has 4.9M tuples.
NOTE: The syntax I show is just to give an idea on what I want to do.