[PIG-683] Semantics of TOKENIZE are not clear - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Invalid
Affects Version/s: 0.2.0
Fix Version/s: 0.2.0
Component/s: impl
Labels:
None

Description

The semantics of TOKENIZE are not clear. In its current form, TOKENIZE takes as input a string and returns a bag. The bag contains 1 tuple per token. The tuple in turn contains a single token. A better approach would be to return a tuple (instead of a bag) that contains as many elements as there are tokens.

On a secondary note, the outputSchema method in TOKENIZE is broken. It should return a bag with a tuple that contains a string and not just a string.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Santhosh Muthur Srinivasan

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 19/Feb/09 18:58

Updated:: 24/Mar/10 22:04

Resolved:: 19/Feb/09 19:20