Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-683

Semantics of TOKENIZE are not clear

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Invalid
    • 0.2.0
    • 0.2.0
    • impl
    • None

    Description

      The semantics of TOKENIZE are not clear. In its current form, TOKENIZE takes as input a string and returns a bag. The bag contains 1 tuple per token. The tuple in turn contains a single token. A better approach would be to return a tuple (instead of a bag) that contains as many elements as there are tokens.

      On a secondary note, the outputSchema method in TOKENIZE is broken. It should return a bag with a tuple that contains a string and not just a string.

      Attachments

        Activity

          People

            Unassigned Unassigned
            sms Santhosh Muthur Srinivasan
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: