Uploaded image for project: 'Pig'
  1. Pig
  2. PIG-1368

Utf8StorageConvertor's bytesToTuple and bytesToBag methods need to be tightened for corner cases

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 0.7.0
    • None
    • None
    • None

    Description

      Consider the following data:
      1\t ( hello , bye ) \n
      1\t( hello , bye )a\n
      2 \t (good , bye)\n

      The following script gives the results below:
      a = load 'junk' as (i:int, t:tuple(s:chararray, r:chararray)); dump a;

      (1,( hello , bye ))
      (1,( hello , bye ))
      (2,(good , bye))

      The current bytesToTuple implementation discards leading and trailing characters before the tuple delimiters and parses the tuple out - I think instead it should treat any leading and trailing characters (including space) near the delimiters as an indication of a malformed tuple and return null.

      Also in the code, consumeBag() should handle the special case of {} and not delegate the handling to consumeTuple().

      In consumeBag() null tuples should not be skipped.

      Attachments

        Activity

          People

            Unassigned Unassigned
            pkamath Pradeep Kamath
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: