Details
-
Bug
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
None
-
None
-
None
-
Reviewed
Description
When multiple directories are passed to PigStorage(',','-schema'), pig will
No attempt to merge conflicting schemas is made during loading. The first schema encountered during a file system scan is used.
For two directories input with schema
file1: (f1:chararray, f2:int) and
file2: (f1:chararray, f2:int, f3:int)
Pig will pick the first schema from file1 and only allow f1, f2 access.
However, output would still contain 3 fields for tuples from file2. This later leads to complete corrupt outputs due to shifted fields resulting in incorrect references.
(This may also happen when input itself contains the delimiter.)
If file2 schema is picked, this is already handled by filling the missing fields with null. (PIG-3100)