Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
0.14.0, 0.15.0
-
None
-
None
-
PigStorage tagPath tagFile
Description
when using the following script:
a= LOAD 'data.csv' USING PigStorage('\t','-tagPath') AS (filepath:chararray, f1:chararray, f2:chararray); b = FOREACH a GENERATE filepath, f2; dump b;
The output will contain the data from filepath and from f1 fields instead of f2 field.
This is caused because of a bug within PigStorage (it also happens in CSVExcelStorage) where it doesn't take the tagPath/tagFile into account when calculating requiredColumns index:
PigStorage.java
if (mRequiredColumns==null || (mRequiredColumns.length>fieldID && mRequiredColumns[fieldID])) addTupleValue(mProtoTuple, buf, start, i);
but fieldID doesn't take the tagFile/tagPath column into account.