Description
The semantics of TOKENIZE are not clear. In its current form, TOKENIZE takes as input a string and returns a bag. The bag contains 1 tuple per token. The tuple in turn contains a single token. A better approach would be to return a tuple (instead of a bag) that contains as many elements as there are tokens.
On a secondary note, the outputSchema method in TOKENIZE is broken. It should return a bag with a tuple that contains a string and not just a string.