[PIG-1368] Utf8StorageConvertor's bytesToTuple and bytesToBag methods need to be tightened for corner cases - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: 0.7.0
Fix Version/s: None
Component/s: None
Labels:
None

Description

Consider the following data:
1\t ( hello , bye ) \n
1\t( hello , bye )a\n
2 \t (good , bye)\n

The following script gives the results below:
a = load 'junk' as (i:int, t:tuple(s:chararray, r:chararray)); dump a;

(1,( hello , bye ))
(1,( hello , bye ))
(2,(good , bye))

The current bytesToTuple implementation discards leading and trailing characters before the tuple delimiters and parses the tuple out - I think instead it should treat any leading and trailing characters (including space) near the delimiters as an indication of a malformed tuple and return null.

Also in the code, consumeBag() should handle the special case of {} and not delegate the handling to consumeTuple().

In consumeBag() null tuples should not be skipped.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Pradeep Kamath

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 08/Apr/10 18:46

Updated:: 12/Jul/10 22:30

Resolved:: 12/Jul/10 22:30